[PATCH 00/30] Initial support for multiple hash functions

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 00/30] Initial support for multiple hash functions
@ 2023-09-27 19:49 Eric W. Biederman
  2023-09-27 19:55 ` [PATCH 01/30] object-file-convert: Stubs for converting from one object format to another Eric W. Biederman
                   ` (31 more replies)
  0 siblings, 32 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-09-27 19:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson

I have been going over and over this patchset trying to figure
out if it is ready to be merged.  I don't know of any deficiencies
so it is at a point it could benefit from a set of eyes that
are not mine.

I had planned to wait a little bit longer but there are some on-going
conversations that could benefit from people seeing what it means for a
repository to support two hash functions at the same time.

A key part of the hash function transition plan is a way that a single
git repository can inter-operate with git repositories whose storage
hash function is SHA-1 and git repositories whose storage hash function
is SHA-256.

This interoperability can defined in terms of two repositories one whose
storage hash function is SHA-1 and another whose storage hash function
is SHA-256.  Those two repositories receive exactly the same objects,
but they store them in different but equivalent ways.

For a repository that has one storage hash function to inter-operate
with a repository that has a different storage hash function requires
the first repository to be able produce it's objects as if they were
stored in the second hash function.

This series of changes focuses on implementing the pieces that allow
a repository that uses one storage hash function to produce the objects
that would have been stored with a second storage hash function.

The final patch in this series is the addition of a test that creates
two repositories one that uses SHA-1 as it's storage hash function
and the other that uses SHA-256 as it's storage hash function.
Identical operations are performed on the two repositories, and their
compatibility objects are compared to verify they are the same.
AKA the SHA-1 repository on the fly generates the objects store in
the SHA-256 repository, and the SHA-256 repository on the fly generates
the objects that are stored in the SHA-1 repository.

There are two fundamental technologies for enabling this.
- The ability to convert a stored object into the object the
  other repository would have stored.
- The ability to remember a mapping between SHA-1 and SHA-256 oids
  of equivalent objects.

With such technologies it is very easy to implement user facing changes.
To avoid locking git into poor decisions by accident I have done my best
to minimize the user facing changes, while still building the internal
infrastructure that is needed for interoperability.

All of this work is inspired by earlier work on interoperability by
"brian m. carlson" and some of the key pieces of code are still his.

To get to the point where I can test if a SHA-1 and a SHA-256 repository
can on the fly generate each other, I have made some small user-facing
changes.

git rev-parse now supports --output-object-format as a way to query
the internal mapping tables between oids and report the equivalent
oid of the other format.

git cat-file when given a oid that does not match the repositories
storage format will now attempt to find the oids equivalent object that
is stored in the repository and if found dynamically generate the object
that would have been stored in a repository with a different storage
hash function and display the object.

An additional file loose-object-index will be stored in ".git/objects/".

An additional option "extensions.compatObjectFormat" is implemented,
that generates and stores mappings between the oids of objects stored in
the repository and oids of the equivalent objects that would be stored
in a repository show storage format was extensions.compatObjectFormat.

Eric W. Biederman (23):
      object-file-convert: Stubs for converting from one object format to another
      oid-array: Teach oid-array to handle multiple kinds of oids
      object-names: Support input of oids in any supported hash
      repository: add a compatibility hash algorithm
      loose: Compatibilty short name support
      object-file: Update the loose object map when writing loose objects
      object-file: Add a compat_oid_in parameter to write_object_file_flags
      commit: Convert mergetag before computing the signature of a commit
      commit: Export add_header_signature to support handling signatures on tags
      tag: sign both hashes
      object: Factor out parse_mode out of fast-import and tree-walk into in object.h
      object-file-convert: Don't leak when converting tag objects
      object-file-convert: Convert commits that embed signed tags
      object-file: Update object_info_extended to reencode objects
      rev-parse: Add an --output-object-format parameter
      builtin/cat-file:  Let the oid determine the output algorithm
      tree-walk: init_tree_desc take an oid to get the hash algorithm
      object-file: Handle compat objects in check_object_signature
      builtin/ls-tree: Let the oid determine the output algorithm
      test-lib: Compute the compatibility hash so tests may use it
      t1006: Rename sha1 to oid
      t1006: Test oid compatibility with cat-file
      t1016-compatObjectFormat: Add tests to verify the conversion between objects

brian m. carlson (7):
      loose: add a mapping between SHA-1 and SHA-256 for loose objects
      commit: write commits for both hashes
      cache: add a function to read an OID of a specific algorithm
      object-file-convert: add a function to convert trees between algorithms
      object-file-convert: convert tag objects when writing
      object-file-convert: convert commit objects when writing
      repository: Implement extensions.compatObjectFormat

 Documentation/config/extensions.txt |  12 ++
 Documentation/git-rev-parse.txt     |  12 ++
 Makefile                            |   3 +
 archive.c                           |   3 +-
 builtin/am.c                        |   6 +-
 builtin/cat-file.c                  |  12 +-
 builtin/checkout.c                  |   8 +-
 builtin/clone.c                     |   2 +-
 builtin/commit.c                    |   2 +-
 builtin/fast-import.c               |  18 +-
 builtin/grep.c                      |   8 +-
 builtin/ls-tree.c                   |   5 +-
 builtin/merge.c                     |   3 +-
 builtin/pack-objects.c              |   6 +-
 builtin/read-tree.c                 |   2 +-
 builtin/rev-parse.c                 |  25 ++-
 builtin/stash.c                     |   5 +-
 builtin/tag.c                       |  45 ++++-
 cache-tree.c                        |   4 +-
 commit.c                            | 219 ++++++++++++++++-----
 commit.h                            |   1 +
 delta-islands.c                     |   2 +-
 diff-lib.c                          |   2 +-
 fsck.c                              |   6 +-
 hash-ll.h                           |   1 +
 hash.h                              |   9 +-
 http-push.c                         |   2 +-
 list-objects.c                      |   2 +-
 loose.c                             | 258 ++++++++++++++++++++++++
 loose.h                             |  22 +++
 match-trees.c                       |   4 +-
 merge-ort.c                         |  11 +-
 merge-recursive.c                   |   2 +-
 merge.c                             |   3 +-
 object-file-convert.c               | 277 ++++++++++++++++++++++++++
 object-file-convert.h               |  24 +++
 object-file.c                       | 212 ++++++++++++++++++--
 object-name.c                       |  49 +++--
 object-name.h                       |   3 +-
 object-store-ll.h                   |   7 +-
 object.c                            |   2 +
 object.h                            |  18 ++
 oid-array.c                         |  12 +-
 pack-bitmap-write.c                 |   2 +-
 packfile.c                          |   3 +-
 reflog.c                            |   2 +-
 repository.c                        |  14 ++
 repository.h                        |   4 +
 revision.c                          |   4 +-
 setup.c                             |  22 +++
 setup.h                             |   1 +
 t/helper/test-delete-gpgsig.c       |  62 ++++++
 t/helper/test-tool.c                |   1 +
 t/helper/test-tool.h                |   1 +
 t/t1006-cat-file.sh                 | 379 +++++++++++++++++++++---------------
 t/t1016-compatObjectFormat.sh       | 280 ++++++++++++++++++++++++++
 t/t1016/gpg                         |   2 +
 t/test-lib-functions.sh             |  17 +-
 tree-walk.c                         |  58 +++---
 tree-walk.h                         |   7 +-
 tree.c                              |   2 +-
 walker.c                            |   2 +-
 62 files changed, 1843 insertions(+), 349 deletions(-)

Eric

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH 01/30] object-file-convert: Stubs for converting from one object format to another
  2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
@ 2023-09-27 19:55 ` Eric W. Biederman
  2023-09-27 20:42   ` Eric Sunshine
  2023-09-27 19:55 ` [PATCH 02/30] oid-array: Teach oid-array to handle multiple kinds of oids Eric W. Biederman
                   ` (30 subsequent siblings)
  31 siblings, 1 reply; 104+ messages in thread
From: Eric W. Biederman @ 2023-09-27 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

Two basic functions are provided:
- convert_object_file Takes an object file it's type and hash algorithm
  and converts it into the equivalent object file that would
  have been generated with hash algorithm "to".

  For blob objects there is no converstion to be done and it is an
  error to use this function on them.

  For commit, tree, and tag objects embedded oids are replaced by the
  oids of the objects they refer to with those objects and their
  object ids reencoded in with the hash algorithm "to".  Signatures
  are rearranged so that they remain valid after the object has
  been reencoded.

- repo_oid_to_algop which takes an oid that refers to an object file
  and returns the oid of the equavalent object file generated
  with the target hash algorithm.

The pair of files object-file-convert.c and object-file-convert.h are
introduced to hold as much of this logic as possible to keep this
conversion logic cleanly separated from everything else and in the
hopes that someday the code will be clean enough git can support
compiling out support for sha1 and the various conversion functions.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 Makefile              |  1 +
 object-file-convert.c | 57 +++++++++++++++++++++++++++++++++++++++++++
 object-file-convert.h | 24 ++++++++++++++++++
 3 files changed, 82 insertions(+)
 create mode 100644 object-file-convert.c
 create mode 100644 object-file-convert.h

diff --git a/Makefile b/Makefile
index 577630936535..f7e824f25cda 100644
--- a/Makefile
+++ b/Makefile
@@ -1073,6 +1073,7 @@ LIB_OBJS += notes-cache.o
 LIB_OBJS += notes-merge.o
 LIB_OBJS += notes-utils.o
 LIB_OBJS += notes.o
+LIB_OBJS += object-file-convert.o
 LIB_OBJS += object-file.o
 LIB_OBJS += object-name.o
 LIB_OBJS += object.o
diff --git a/object-file-convert.c b/object-file-convert.c
new file mode 100644
index 000000000000..ba3e18f6af44
--- /dev/null
+++ b/object-file-convert.c
@@ -0,0 +1,57 @@
+#include "git-compat-util.h"
+#include "gettext.h"
+#include "strbuf.h"
+#include "repository.h"
+#include "hash-ll.h"
+#include "object.h"
+#include "object-file-convert.h"
+
+int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
+		      const struct git_hash_algo *to, struct object_id *dest)
+{
+	/*
+	 * If the source alogirthm is not set, then we're using the
+	 * default hash algorithm for that object.
+	 */
+	const struct git_hash_algo *from =
+		src->algo ? &hash_algos[src->algo] : repo->hash_algo;
+
+	if (from == to) {
+		if (src != dest)
+			oidcpy(dest, src);
+		return 0;
+	}
+	return -1;
+}
+
+int convert_object_file(struct strbuf *outbuf,
+			const struct git_hash_algo *from,
+			const struct git_hash_algo *to,
+			const void *buf, size_t len,
+			enum object_type type,
+			int gentle)
+{
+	int ret;
+
+	/* Don't call this function when no conversion is necessary */
+	if ((from == to) || (type == OBJ_BLOB))
+		die("Refusing noop object file conversion");
+
+	switch (type) {
+	case OBJ_COMMIT:
+	case OBJ_TREE:
+	case OBJ_TAG:
+	default:
+		/* Not implemented yet, so fail. */
+		ret = -1;
+		break;
+	}
+	if (!ret)
+		return 0;
+	if (gentle) {
+		strbuf_release(outbuf);
+		return ret;
+	}
+	die(_("Failed to convert object from %s to %s"),
+		from->name, to->name);
+}
diff --git a/object-file-convert.h b/object-file-convert.h
new file mode 100644
index 000000000000..a4f802aa8eea
--- /dev/null
+++ b/object-file-convert.h
@@ -0,0 +1,24 @@
+#ifndef OBJECT_CONVERT_H
+#define OBJECT_CONVERT_H
+
+struct repository;
+struct object_id;
+struct git_hash_algo;
+struct strbuf;
+#include "object.h"
+
+int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
+		      const struct git_hash_algo *to, struct object_id *dest);
+
+/*
+ * Convert an object file from one hash algorithm to another algorithm.
+ * Return -1 on failure, 0 on success.
+ */
+int convert_object_file(struct strbuf *outbuf,
+			const struct git_hash_algo *from,
+			const struct git_hash_algo *to,
+			const void *buf, size_t len,
+			enum object_type type,
+			int gentle);
+
+#endif /* OBJECT_CONVERT_H */
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH 02/30] oid-array: Teach oid-array to handle multiple kinds of oids
  2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
  2023-09-27 19:55 ` [PATCH 01/30] object-file-convert: Stubs for converting from one object format to another Eric W. Biederman
@ 2023-09-27 19:55 ` Eric W. Biederman
  2023-09-27 23:20   ` Eric Sunshine
  2023-09-27 19:55 ` [PATCH 03/30] object-names: Support input of oids in any supported hash Eric W. Biederman
                   ` (29 subsequent siblings)
  31 siblings, 1 reply; 104+ messages in thread
From: Eric W. Biederman @ 2023-09-27 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

While looking at how to handle input of both SHA-1 and SHA-256 oids in
get_oid_with_context, I realized that the oid_array in
repo_for_each_abbrev might have more than one kind of oid stored in it
simulataneously.

Update to oid_array_append to ensure that oids added to an oid array
always have an algorithm set.

Update void_hashcmp to first verify two oids use the same hash algorithm
before comparing them to each other.

With that oid-array should be safe to use with differnt kinds of
oids simultaneously.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 oid-array.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/oid-array.c b/oid-array.c
index 8e4717746c31..1f36651754ed 100644
--- a/oid-array.c
+++ b/oid-array.c
@@ -6,12 +6,20 @@ void oid_array_append(struct oid_array *array, const struct object_id *oid)
 {
 	ALLOC_GROW(array->oid, array->nr + 1, array->alloc);
 	oidcpy(&array->oid[array->nr++], oid);
+	if (!oid->algo)
+		oid_set_algo(&array->oid[array->nr - 1], the_hash_algo);
 	array->sorted = 0;
 }
 
-static int void_hashcmp(const void *a, const void *b)
+static int void_hashcmp(const void *va, const void *vb)
 {
-	return oidcmp(a, b);
+	const struct object_id *a = va, *b = vb;
+	int ret;
+	if (a->algo == b->algo)
+		ret = oidcmp(a, b);
+	else
+		ret = a->algo > b->algo ? 1 : -1;
+	return ret;
 }
 
 void oid_array_sort(struct oid_array *array)
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH 03/30] object-names: Support input of oids in any supported hash
  2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
  2023-09-27 19:55 ` [PATCH 01/30] object-file-convert: Stubs for converting from one object format to another Eric W. Biederman
  2023-09-27 19:55 ` [PATCH 02/30] oid-array: Teach oid-array to handle multiple kinds of oids Eric W. Biederman
@ 2023-09-27 19:55 ` Eric W. Biederman
  2023-09-27 23:29   ` Eric Sunshine
  2023-09-27 19:55 ` [PATCH 04/30] repository: add a compatibility hash algorithm Eric W. Biederman
                   ` (28 subsequent siblings)
  31 siblings, 1 reply; 104+ messages in thread
From: Eric W. Biederman @ 2023-09-27 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

Support short oids encoded in any algorithm, while ensuring enough of
the oid is specified to disambiguate between all of the oids in the
repository encoded in any algorithm.

By default have the code continue to only accept oids specified in the
storage hash algorithm of the repository, but when something is
ambiguous display all of the possible oids from any oid encoding.

A new flag is added GET_OID_HASH_ANY that when supplied causes the
code to accept oids specified in any hash algorithm, and to return the
oids that were resolved.

This implements the functionality that allows both SHA-1 and SHA-256
object names, from the "Object names on the command line" section of
the hash function transition document.

Care is taken in get_short_oid so that when the result is ambiguous
the output remains the same of GIT_OID_HASH_ANY was not supplied.
If GET_OID_HASH_ANY was supplied objects of any hash algorithm
that match the prefix are displayed.

This required updating repo_for_each_abbrev to give it a parameter
so that it knows to look at all hash algorithms.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 builtin/rev-parse.c |  2 +-
 hash-ll.h           |  1 +
 object-name.c       | 49 +++++++++++++++++++++++++++++++++++----------
 object-name.h       |  3 ++-
 4 files changed, 42 insertions(+), 13 deletions(-)

diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index fde8861ca4e0..43e96765400c 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -882,7 +882,7 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
 				continue;
 			}
 			if (skip_prefix(arg, "--disambiguate=", &arg)) {
-				repo_for_each_abbrev(the_repository, arg,
+				repo_for_each_abbrev(the_repository, arg, the_hash_algo,
 						     show_abbrev, NULL);
 				continue;
 			}
diff --git a/hash-ll.h b/hash-ll.h
index 10d84cc20888..2cfde63ae1cf 100644
--- a/hash-ll.h
+++ b/hash-ll.h
@@ -145,6 +145,7 @@ struct object_id {
 #define GET_OID_RECORD_PATH     0200
 #define GET_OID_ONLY_TO_DIE    04000
 #define GET_OID_REQUIRE_PATH  010000
+#define GET_OID_HASH_ANY      020000
 
 #define GET_OID_DISAMBIGUATORS \
 	(GET_OID_COMMIT | GET_OID_COMMITTISH | \
diff --git a/object-name.c b/object-name.c
index 0bfa29dbbfe9..976b7106821b 100644
--- a/object-name.c
+++ b/object-name.c
@@ -25,6 +25,7 @@
 #include "midx.h"
 #include "commit-reach.h"
 #include "date.h"
+#include "object-file-convert.h"
 
 static int get_oid_oneline(struct repository *r, const char *, struct object_id *, struct commit_list *);
 
@@ -49,6 +50,7 @@ struct disambiguate_state {
 
 static void update_candidates(struct disambiguate_state *ds, const struct object_id *current)
 {
+	/* The hash algorithm of the current has already been filtered */
 	if (ds->always_call_fn) {
 		ds->ambiguous = ds->fn(ds->repo, current, ds->cb_data) ? 1 : 0;
 		return;
@@ -134,6 +136,8 @@ static void unique_in_midx(struct multi_pack_index *m,
 {
 	uint32_t num, i, first = 0;
 	const struct object_id *current = NULL;
+	int len = ds->len > ds->repo->hash_algo->hexsz ?
+		ds->repo->hash_algo->hexsz : ds->len;
 	num = m->num_objects;
 
 	if (!num)
@@ -149,7 +153,7 @@ static void unique_in_midx(struct multi_pack_index *m,
 	for (i = first; i < num && !ds->ambiguous; i++) {
 		struct object_id oid;
 		current = nth_midxed_object_oid(&oid, m, i);
-		if (!match_hash(ds->len, ds->bin_pfx.hash, current->hash))
+		if (!match_hash(len, ds->bin_pfx.hash, current->hash))
 			break;
 		update_candidates(ds, current);
 	}
@@ -159,6 +163,8 @@ static void unique_in_pack(struct packed_git *p,
 			   struct disambiguate_state *ds)
 {
 	uint32_t num, i, first = 0;
+	int len = ds->len > ds->repo->hash_algo->hexsz ?
+		ds->repo->hash_algo->hexsz : ds->len;
 
 	if (p->multi_pack_index)
 		return;
@@ -177,7 +183,7 @@ static void unique_in_pack(struct packed_git *p,
 	for (i = first; i < num && !ds->ambiguous; i++) {
 		struct object_id oid;
 		nth_packed_object_id(&oid, p, i);
-		if (!match_hash(ds->len, ds->bin_pfx.hash, oid.hash))
+		if (!match_hash(len, ds->bin_pfx.hash, oid.hash))
 			break;
 		update_candidates(ds, &oid);
 	}
@@ -188,6 +194,10 @@ static void find_short_packed_object(struct disambiguate_state *ds)
 	struct multi_pack_index *m;
 	struct packed_git *p;
 
+	/* Skip, unless oids from the storage hash algorithm are wanted */
+	if (ds->bin_pfx.algo && (&hash_algos[ds->bin_pfx.algo] != ds->repo->hash_algo))
+		return;
+
 	for (m = get_multi_pack_index(ds->repo); m && !ds->ambiguous;
 	     m = m->next)
 		unique_in_midx(m, ds);
@@ -326,11 +336,12 @@ int set_disambiguate_hint_config(const char *var, const char *value)
 
 static int init_object_disambiguation(struct repository *r,
 				      const char *name, int len,
+				      const struct git_hash_algo *algo,
 				      struct disambiguate_state *ds)
 {
 	int i;
 
-	if (len < MINIMUM_ABBREV || len > the_hash_algo->hexsz)
+	if (len < MINIMUM_ABBREV || len > GIT_MAX_HEXSZ)
 		return -1;
 
 	memset(ds, 0, sizeof(*ds));
@@ -357,6 +368,7 @@ static int init_object_disambiguation(struct repository *r,
 	ds->len = len;
 	ds->hex_pfx[len] = '\0';
 	ds->repo = r;
+	ds->bin_pfx.algo = algo ? hash_algo_by_ptr(algo) : GIT_HASH_UNKNOWN;
 	prepare_alt_odb(r);
 	return 0;
 }
@@ -491,9 +503,10 @@ static int repo_collect_ambiguous(struct repository *r UNUSED,
 	return collect_ambiguous(oid, data);
 }
 
-static int sort_ambiguous(const void *a, const void *b, void *ctx)
+static int sort_ambiguous(const void *va, const void *vb, void *ctx)
 {
 	struct repository *sort_ambiguous_repo = ctx;
+	const struct object_id *a = va, *b = vb;
 	int a_type = oid_object_info(sort_ambiguous_repo, a, NULL);
 	int b_type = oid_object_info(sort_ambiguous_repo, b, NULL);
 	int a_type_sort;
@@ -503,8 +516,13 @@ static int sort_ambiguous(const void *a, const void *b, void *ctx)
 	 * Sorts by hash within the same object type, just as
 	 * oid_array_for_each_unique() would do.
 	 */
-	if (a_type == b_type)
-		return oidcmp(a, b);
+	if (a_type == b_type) {
+		/* Is the hash algorithm the same? */
+		if (a->algo == b->algo)
+			return oidcmp(a, b);
+		else
+			return a->algo > b->algo ? 1 : -1;
+	}
 
 	/*
 	 * Between object types show tags, then commits, and finally
@@ -533,8 +551,12 @@ static enum get_oid_result get_short_oid(struct repository *r,
 	int status;
 	struct disambiguate_state ds;
 	int quietly = !!(flags & GET_OID_QUIETLY);
+	const struct git_hash_algo *algo = r->hash_algo;
+
+	if (flags & GET_OID_HASH_ANY)
+		algo = NULL;
 
-	if (init_object_disambiguation(r, name, len, &ds) < 0)
+	if (init_object_disambiguation(r, name, len, algo, &ds) < 0)
 		return -1;
 
 	if (HAS_MULTI_BITS(flags & GET_OID_DISAMBIGUATORS))
@@ -553,6 +575,7 @@ static enum get_oid_result get_short_oid(struct repository *r,
 	else
 		ds.fn = default_disambiguate_hint;
 
+
 	find_short_object_filename(&ds);
 	find_short_packed_object(&ds);
 	status = finish_object_disambiguation(&ds, oid);
@@ -588,7 +611,7 @@ static enum get_oid_result get_short_oid(struct repository *r,
 		if (!ds.ambiguous)
 			ds.fn = NULL;
 
-		repo_for_each_abbrev(r, ds.hex_pfx, collect_ambiguous, &collect);
+		repo_for_each_abbrev(r, ds.hex_pfx, algo, collect_ambiguous, &collect);
 		sort_ambiguous_oid_array(r, &collect);
 
 		if (oid_array_for_each(&collect, show_ambiguous_object, &out))
@@ -610,15 +633,17 @@ static enum get_oid_result get_short_oid(struct repository *r,
 }
 
 int repo_for_each_abbrev(struct repository *r, const char *prefix,
+			 const struct git_hash_algo *algo,
 			 each_abbrev_fn fn, void *cb_data)
 {
 	struct oid_array collect = OID_ARRAY_INIT;
 	struct disambiguate_state ds;
 	int ret;
 
-	if (init_object_disambiguation(r, prefix, strlen(prefix), &ds) < 0)
+	if (init_object_disambiguation(r, prefix, strlen(prefix), algo, &ds) < 0)
 		return -1;
 
+	ds.bin_pfx.algo = GIT_HASH_UNKNOWN;
 	ds.always_call_fn = 1;
 	ds.fn = repo_collect_ambiguous;
 	ds.cb_data = &collect;
@@ -787,10 +812,12 @@ void strbuf_add_unique_abbrev(struct strbuf *sb, const struct object_id *oid,
 int repo_find_unique_abbrev_r(struct repository *r, char *hex,
 			      const struct object_id *oid, int len)
 {
+	const struct git_hash_algo *algo =
+		oid->algo ? &hash_algos[oid->algo] : r->hash_algo;
 	struct disambiguate_state ds;
 	struct min_abbrev_data mad;
 	struct object_id oid_ret;
-	const unsigned hexsz = r->hash_algo->hexsz;
+	const unsigned hexsz = algo->hexsz;
 
 	if (len < 0) {
 		unsigned long count = repo_approximate_object_count(r);
@@ -826,7 +853,7 @@ int repo_find_unique_abbrev_r(struct repository *r, char *hex,
 
 	find_abbrev_len_packed(&mad);
 
-	if (init_object_disambiguation(r, hex, mad.cur_len, &ds) < 0)
+	if (init_object_disambiguation(r, hex, mad.cur_len, algo, &ds) < 0)
 		return -1;
 
 	ds.fn = repo_extend_abbrev_len;
diff --git a/object-name.h b/object-name.h
index 9ae522307148..064ddc97d1fe 100644
--- a/object-name.h
+++ b/object-name.h
@@ -67,7 +67,8 @@ enum get_oid_result get_oid_with_context(struct repository *repo, const char *st
 
 
 typedef int each_abbrev_fn(const struct object_id *oid, void *);
-int repo_for_each_abbrev(struct repository *r, const char *prefix, each_abbrev_fn, void *);
+int repo_for_each_abbrev(struct repository *r, const char *prefix,
+			 const struct git_hash_algo *algo, each_abbrev_fn, void *);
 
 int set_disambiguate_hint_config(const char *var, const char *value);
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH 04/30] repository: add a compatibility hash algorithm
  2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
                   ` (2 preceding siblings ...)
  2023-09-27 19:55 ` [PATCH 03/30] object-names: Support input of oids in any supported hash Eric W. Biederman
@ 2023-09-27 19:55 ` Eric W. Biederman
  2023-09-27 19:55 ` [PATCH 05/30] loose: add a mapping between SHA-1 and SHA-256 for loose objects Eric W. Biederman
                   ` (27 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-09-27 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

We currently have support for using a full stage 4 SHA-256
implementation.  However, we'd like to support interoperability with
SHA-1 repositories as well.  The transition plan anticipates a
compatibility hash algorithm configuration option that we can use to
implement support for this.  Let's add an element to the repository
structure that indicates the compatibility hash algorithm so we can use
it when we need to consider interoperability between algorithms.

Add a helper function repo_set_compat_hash_algo that takes a
compatibility hash algorithm and sets "repo->compat_hash_algo".  If
GIT_HASH_UNKNOWN is passed as the compatibilty hash algorithm
"repo->compat_hash_algo" is set to NULL.

For now, the code always results in "repo->compat_hash_algo" always
being set to NULL, but that will change once a configuration option
is added.

Inspired-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 repository.c | 8 ++++++++
 repository.h | 4 ++++
 setup.c      | 3 +++
 3 files changed, 15 insertions(+)

diff --git a/repository.c b/repository.c
index a7679ceeaa45..80252b79e93e 100644
--- a/repository.c
+++ b/repository.c
@@ -104,6 +104,13 @@ void repo_set_hash_algo(struct repository *repo, int hash_algo)
 	repo->hash_algo = &hash_algos[hash_algo];
 }
 
+void repo_set_compat_hash_algo(struct repository *repo, int algo)
+{
+	if (hash_algo_by_ptr(repo->hash_algo) == algo)
+		BUG("hash_algo and compat_hash_algo match");
+	repo->compat_hash_algo = algo ? &hash_algos[algo] : NULL;
+}
+
 /*
  * Attempt to resolve and set the provided 'gitdir' for repository 'repo'.
  * Return 0 upon success and a non-zero value upon failure.
@@ -184,6 +191,7 @@ int repo_init(struct repository *repo,
 		goto error;
 
 	repo_set_hash_algo(repo, format.hash_algo);
+	repo_set_compat_hash_algo(repo, GIT_HASH_UNKNOWN);
 	repo->repository_format_worktree_config = format.worktree_config;
 
 	/* take ownership of format.partial_clone */
diff --git a/repository.h b/repository.h
index 5f18486f6465..bf3fc601cc53 100644
--- a/repository.h
+++ b/repository.h
@@ -160,6 +160,9 @@ struct repository {
 	/* Repository's current hash algorithm, as serialized on disk. */
 	const struct git_hash_algo *hash_algo;
 
+	/* Repository's compatibility hash algorithm. */
+	const struct git_hash_algo *compat_hash_algo;
+
 	/* A unique-id for tracing purposes. */
 	int trace2_repo_id;
 
@@ -199,6 +202,7 @@ void repo_set_gitdir(struct repository *repo, const char *root,
 		     const struct set_gitdir_args *extra_args);
 void repo_set_worktree(struct repository *repo, const char *path);
 void repo_set_hash_algo(struct repository *repo, int algo);
+void repo_set_compat_hash_algo(struct repository *repo, int compat_algo);
 void initialize_the_repository(void);
 RESULT_MUST_BE_USED
 int repo_init(struct repository *r, const char *gitdir, const char *worktree);
diff --git a/setup.c b/setup.c
index ef9f79b8885e..deb5a33fe9e1 100644
--- a/setup.c
+++ b/setup.c
@@ -1572,6 +1572,8 @@ const char *setup_git_directory_gently(int *nongit_ok)
 		}
 		if (startup_info->have_repository) {
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
+			repo_set_compat_hash_algo(the_repository,
+						  GIT_HASH_UNKNOWN);
 			the_repository->repository_format_worktree_config =
 				repo_fmt.worktree_config;
 			/* take ownership of repo_fmt.partial_clone */
@@ -1665,6 +1667,7 @@ void check_repository_format(struct repository_format *fmt)
 	check_repository_format_gently(get_git_dir(), fmt, NULL);
 	startup_info->have_repository = 1;
 	repo_set_hash_algo(the_repository, fmt->hash_algo);
+	repo_set_compat_hash_algo(the_repository, GIT_HASH_UNKNOWN);
 	the_repository->repository_format_worktree_config =
 		fmt->worktree_config;
 	the_repository->repository_format_partial_clone =
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH 05/30] loose: add a mapping between SHA-1 and SHA-256 for loose objects
  2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
                   ` (3 preceding siblings ...)
  2023-09-27 19:55 ` [PATCH 04/30] repository: add a compatibility hash algorithm Eric W. Biederman
@ 2023-09-27 19:55 ` Eric W. Biederman
  2023-09-28  7:14   ` Eric Sunshine
  2023-09-27 19:55 ` [PATCH 06/30] loose: Compatibilty short name support Eric W. Biederman
                   ` (26 subsequent siblings)
  31 siblings, 1 reply; 104+ messages in thread
From: Eric W. Biederman @ 2023-09-27 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W . Biederman

From: "brian m. carlson" <sandals@crustytoothpaste.net>

As part of the transition plan, we'd like to add a file in the .git
directory that maps loose objects between SHA-1 and SHA-256.  Let's
implement the specification in the transition plan and store this data
on a per-repository basis in struct repository.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 Makefile              |   1 +
 loose.c               | 245 ++++++++++++++++++++++++++++++++++++++++++
 loose.h               |  22 ++++
 object-file-convert.c |  14 ++-
 object-store-ll.h     |   3 +
 object.c              |   2 +
 repository.c          |   6 ++
 7 files changed, 292 insertions(+), 1 deletion(-)
 create mode 100644 loose.c
 create mode 100644 loose.h

diff --git a/Makefile b/Makefile
index f7e824f25cda..3c18664def9a 100644
--- a/Makefile
+++ b/Makefile
@@ -1053,6 +1053,7 @@ LIB_OBJS += list-objects-filter.o
 LIB_OBJS += list-objects.o
 LIB_OBJS += lockfile.o
 LIB_OBJS += log-tree.o
+LIB_OBJS += loose.o
 LIB_OBJS += ls-refs.o
 LIB_OBJS += mailinfo.o
 LIB_OBJS += mailmap.o
diff --git a/loose.c b/loose.c
new file mode 100644
index 000000000000..28d11593b2ea
--- /dev/null
+++ b/loose.c
@@ -0,0 +1,245 @@
+#include "git-compat-util.h"
+#include "hash.h"
+#include "path.h"
+#include "object-store.h"
+#include "hex.h"
+#include "wrapper.h"
+#include "gettext.h"
+#include "loose.h"
+#include "lockfile.h"
+
+static const char *loose_object_header = "# loose-object-idx\n";
+
+static inline int should_use_loose_object_map(struct repository *repo)
+{
+	return repo->compat_hash_algo && repo->gitdir;
+}
+
+void loose_object_map_init(struct loose_object_map **map)
+{
+	struct loose_object_map *m;
+	m = xmalloc(sizeof(**map));
+	m->to_compat = kh_init_oid_map();
+	m->to_storage = kh_init_oid_map();
+	*map = m;
+}
+
+static int insert_oid_pair(kh_oid_map_t *map, const struct object_id *key, const struct object_id *value)
+{
+	khiter_t pos;
+	int ret;
+	struct object_id *stored;
+
+	pos = kh_put_oid_map(map, *key, &ret);
+
+	/* This item already exists in the map. */
+	if (ret == 0)
+		return 0;
+
+	stored = xmalloc(sizeof(*stored));
+	oidcpy(stored, value);
+	kh_value(map, pos) = stored;
+	return 1;
+}
+
+static int load_one_loose_object_map(struct repository *repo, struct object_directory *dir)
+{
+	struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT;
+	FILE *fp;
+
+	if (!dir->loose_map)
+		loose_object_map_init(&dir->loose_map);
+
+	insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->empty_tree, repo->compat_hash_algo->empty_tree);
+	insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->empty_tree, repo->hash_algo->empty_tree);
+
+	insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->empty_blob, repo->compat_hash_algo->empty_blob);
+	insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->empty_blob, repo->hash_algo->empty_blob);
+
+	insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->null_oid, repo->compat_hash_algo->null_oid);
+	insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->null_oid, repo->hash_algo->null_oid);
+
+	strbuf_git_common_path(&path, repo, "objects/loose-object-idx");
+	fp = fopen(path.buf, "rb");
+	if (!fp)
+		return 0;
+
+	errno = 0;
+	if (strbuf_getwholeline(&buf, fp, '\n') || strcmp(buf.buf, loose_object_header))
+		goto err;
+	while (!strbuf_getline_lf(&buf, fp)) {
+		const char *p;
+		struct object_id oid, compat_oid;
+		if (parse_oid_hex_algop(buf.buf, &oid, &p, repo->hash_algo) ||
+		    *p++ != ' ' ||
+		    parse_oid_hex_algop(p, &compat_oid, &p, repo->compat_hash_algo) ||
+		    p != buf.buf + buf.len)
+			goto err;
+		insert_oid_pair(dir->loose_map->to_compat, &oid, &compat_oid);
+		insert_oid_pair(dir->loose_map->to_storage, &compat_oid, &oid);
+	}
+
+	strbuf_release(&buf);
+	strbuf_release(&path);
+	return errno ? -1 : 0;
+err:
+	strbuf_release(&buf);
+	strbuf_release(&path);
+	return -1;
+}
+
+int repo_read_loose_object_map(struct repository *repo)
+{
+	struct object_directory *dir;
+
+	if (!should_use_loose_object_map(repo))
+		return 0;
+
+	prepare_alt_odb(repo);
+
+	for (dir = repo->objects->odb; dir; dir = dir->next) {
+		if (load_one_loose_object_map(repo, dir) < 0) {
+			return -1;
+		}
+	}
+	return 0;
+}
+
+int repo_write_loose_object_map(struct repository *repo)
+{
+	kh_oid_map_t *map = repo->objects->odb->loose_map->to_compat;
+	struct lock_file lock;
+	int fd;
+	khiter_t iter;
+	struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT;
+
+	if (!should_use_loose_object_map(repo))
+		return 0;
+
+	strbuf_git_common_path(&path, repo, "objects/loose-object-idx");
+	fd = hold_lock_file_for_update_timeout(&lock, path.buf, LOCK_DIE_ON_ERROR, -1);
+	iter = kh_begin(map);
+	if (write_in_full(fd, loose_object_header, strlen(loose_object_header)) < 0)
+		goto errout;
+
+	for (; iter != kh_end(map); iter++) {
+		if (kh_exist(map, iter)) {
+			if (oideq(&kh_key(map, iter), the_hash_algo->empty_tree) ||
+			    oideq(&kh_key(map, iter), the_hash_algo->empty_blob))
+				continue;
+			strbuf_addf(&buf, "%s %s\n", oid_to_hex(&kh_key(map, iter)), oid_to_hex(kh_value(map, iter)));
+			if (write_in_full(fd, buf.buf, buf.len) < 0)
+				goto errout;
+			strbuf_reset(&buf);
+		}
+	}
+	strbuf_release(&buf);
+	if (commit_lock_file(&lock) < 0) {
+		error_errno(_("could not write loose object index %s"), path.buf);
+		strbuf_release(&path);
+		return -1;
+	}
+	strbuf_release(&path);
+	return 0;
+errout:
+	rollback_lock_file(&lock);
+	strbuf_release(&buf);
+	error_errno(_("failed to write loose object index %s\n"), path.buf);
+	strbuf_release(&path);
+	return -1;
+}
+
+static int write_one_object(struct repository *repo, const struct object_id *oid,
+			    const struct object_id *compat_oid)
+{
+	struct lock_file lock;
+	int fd;
+	struct stat st;
+	struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT;
+
+	strbuf_git_common_path(&path, repo, "objects/loose-object-idx");
+	hold_lock_file_for_update_timeout(&lock, path.buf, LOCK_DIE_ON_ERROR, -1);
+
+	fd = open(path.buf, O_WRONLY | O_CREAT | O_APPEND, 0666);
+	if (fd < 0)
+		goto errout;
+	if (fstat(fd, &st) < 0)
+		goto errout;
+	if (!st.st_size && write_in_full(fd, loose_object_header, strlen(loose_object_header)) < 0)
+		goto errout;
+
+	strbuf_addf(&buf, "%s %s\n", oid_to_hex(oid), oid_to_hex(compat_oid));
+	if (write_in_full(fd, buf.buf, buf.len) < 0)
+		goto errout;
+	if (close(fd))
+		goto errout;
+	adjust_shared_perm(path.buf);
+	rollback_lock_file(&lock);
+	strbuf_release(&buf);
+	strbuf_release(&path);
+	return 0;
+errout:
+	error_errno(_("failed to write loose object index %s\n"), path.buf);
+	close(fd);
+	rollback_lock_file(&lock);
+	strbuf_release(&buf);
+	strbuf_release(&path);
+	return -1;
+}
+
+int repo_add_loose_object_map(struct repository *repo, const struct object_id *oid,
+			      const struct object_id *compat_oid)
+{
+	int inserted = 0;
+
+	if (!should_use_loose_object_map(repo))
+		return 0;
+
+	inserted |= insert_oid_pair(repo->objects->odb->loose_map->to_compat, oid, compat_oid);
+	inserted |= insert_oid_pair(repo->objects->odb->loose_map->to_storage, compat_oid, oid);
+	if (inserted)
+		return write_one_object(repo, oid, compat_oid);
+	return 0;
+}
+
+int repo_loose_object_map_oid(struct repository *repo,
+			      const struct object_id *src,
+			      const struct git_hash_algo *to,
+			      struct object_id *dest)
+
+{
+	struct object_directory *dir;
+	kh_oid_map_t *map;
+	khiter_t pos;
+
+	for (dir = repo->objects->odb; dir; dir = dir->next) {
+		struct loose_object_map *loose_map = dir->loose_map;
+		if (!loose_map)
+			continue;
+		map = (to == repo->compat_hash_algo) ?
+			loose_map->to_compat :
+			loose_map->to_storage;
+		pos = kh_get_oid_map(map, *src);
+		if (pos < kh_end(map)) {
+			oidcpy(dest, kh_value(map, pos));
+			return 0;
+		}
+	}
+	return -1;
+}
+
+void loose_object_map_clear(struct loose_object_map **map)
+{
+	struct loose_object_map *m = *map;
+	struct object_id *oid;
+
+	if (!m)
+		return;
+
+	kh_foreach_value(m->to_compat, oid, free(oid));
+	kh_foreach_value(m->to_storage, oid, free(oid));
+	kh_destroy_oid_map(m->to_compat);
+	kh_destroy_oid_map(m->to_storage);
+	free(m);
+	*map = NULL;
+}
diff --git a/loose.h b/loose.h
new file mode 100644
index 000000000000..2c2957072c5f
--- /dev/null
+++ b/loose.h
@@ -0,0 +1,22 @@
+#ifndef LOOSE_H
+#define LOOSE_H
+
+#include "khash.h"
+
+struct loose_object_map {
+	kh_oid_map_t *to_compat;
+	kh_oid_map_t *to_storage;
+};
+
+void loose_object_map_init(struct loose_object_map **map);
+void loose_object_map_clear(struct loose_object_map **map);
+int repo_loose_object_map_oid(struct repository *repo,
+			      const struct object_id *src,
+			      const struct git_hash_algo *dest_algo,
+			      struct object_id *dest);
+int repo_add_loose_object_map(struct repository *repo, const struct object_id *oid,
+			      const struct object_id *compat_oid);
+int repo_read_loose_object_map(struct repository *repo);
+int repo_write_loose_object_map(struct repository *repo);
+
+#endif
diff --git a/object-file-convert.c b/object-file-convert.c
index ba3e18f6af44..4d62ed192bf0 100644
--- a/object-file-convert.c
+++ b/object-file-convert.c
@@ -4,6 +4,7 @@
 #include "repository.h"
 #include "hash-ll.h"
 #include "object.h"
+#include "loose.h"
 #include "object-file-convert.h"
 
 int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
@@ -21,7 +22,18 @@ int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
 			oidcpy(dest, src);
 		return 0;
 	}
-	return -1;
+	if (repo_loose_object_map_oid(repo, src, to, dest)) {
+		/*
+		 * We may have loaded the object map at repo initialization but
+		 * another process (perhaps upstream of a pipe from us) may have
+		 * written a new object into the map.  If the object is missing,
+		 * let's reload the map to see if the object has appeared.
+		 */
+		repo_read_loose_object_map(repo);
+		if (repo_loose_object_map_oid(repo, src, to, dest))
+			return -1;
+	}
+	return 0;
 }
 
 int convert_object_file(struct strbuf *outbuf,
diff --git a/object-store-ll.h b/object-store-ll.h
index 26a3895c821c..bc76d6bec80d 100644
--- a/object-store-ll.h
+++ b/object-store-ll.h
@@ -26,6 +26,9 @@ struct object_directory {
 	uint32_t loose_objects_subdir_seen[8]; /* 256 bits */
 	struct oidtree *loose_objects_cache;
 
+	/* Map between object IDs for loose objects. */
+	struct loose_object_map *loose_map;
+
 	/*
 	 * This is a temporary object store created by the tmp_objdir
 	 * facility. Disable ref updates since the objects in the store
diff --git a/object.c b/object.c
index 2c61e4c86217..186a0a47c0fb 100644
--- a/object.c
+++ b/object.c
@@ -13,6 +13,7 @@
 #include "alloc.h"
 #include "packfile.h"
 #include "commit-graph.h"
+#include "loose.h"
 
 unsigned int get_max_object_index(void)
 {
@@ -540,6 +541,7 @@ void free_object_directory(struct object_directory *odb)
 {
 	free(odb->path);
 	odb_clear_loose_cache(odb);
+	loose_object_map_clear(&odb->loose_map);
 	free(odb);
 }
 
diff --git a/repository.c b/repository.c
index 80252b79e93e..6214f61cf4e7 100644
--- a/repository.c
+++ b/repository.c
@@ -14,6 +14,7 @@
 #include "read-cache-ll.h"
 #include "remote.h"
 #include "setup.h"
+#include "loose.h"
 #include "submodule-config.h"
 #include "sparse-index.h"
 #include "trace2.h"
@@ -109,6 +110,8 @@ void repo_set_compat_hash_algo(struct repository *repo, int algo)
 	if (hash_algo_by_ptr(repo->hash_algo) == algo)
 		BUG("hash_algo and compat_hash_algo match");
 	repo->compat_hash_algo = algo ? &hash_algos[algo] : NULL;
+	if (repo->compat_hash_algo)
+		repo_read_loose_object_map(repo);
 }
 
 /*
@@ -201,6 +204,9 @@ int repo_init(struct repository *repo,
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
+	if (repo->compat_hash_algo)
+		repo_read_loose_object_map(repo);
+
 	clear_repository_format(&format);
 	return 0;
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH 06/30] loose: Compatibilty short name support
  2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
                   ` (4 preceding siblings ...)
  2023-09-27 19:55 ` [PATCH 05/30] loose: add a mapping between SHA-1 and SHA-256 for loose objects Eric W. Biederman
@ 2023-09-27 19:55 ` Eric W. Biederman
  2023-09-27 19:55 ` [PATCH 07/30] object-file: Update the loose object map when writing loose objects Eric W. Biederman
                   ` (25 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-09-27 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

Update loose_objects_cache when udpating the loose objects map.  This
oidtree is used to discover which oids are possibilities when
resolving short names, and it can support a mixture of sha1
and sha256 oids.

With this any oid recorded objects/loose-objects-idx is usable
for resolving an oid to an object.

To make this maintainable a helper insert_loose_map is factored
out of load_one_loose_object_map and repo_add_loose_object_map,
and then modified to also update the loose_objects_cache.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 loose.c | 37 +++++++++++++++++++++++++------------
 1 file changed, 25 insertions(+), 12 deletions(-)

diff --git a/loose.c b/loose.c
index 28d11593b2ea..bb50d43cd1d9 100644
--- a/loose.c
+++ b/loose.c
@@ -7,6 +7,7 @@
 #include "gettext.h"
 #include "loose.h"
 #include "lockfile.h"
+#include "oidtree.h"
 
 static const char *loose_object_header = "# loose-object-idx\n";
 
@@ -42,6 +43,21 @@ static int insert_oid_pair(kh_oid_map_t *map, const struct object_id *key, const
 	return 1;
 }
 
+static int insert_loose_map(struct object_directory *odb,
+			    const struct object_id *oid,
+			    const struct object_id *compat_oid)
+{
+	struct loose_object_map *map = odb->loose_map;
+	int inserted = 0;
+
+	inserted |= insert_oid_pair(map->to_compat, oid, compat_oid);
+	inserted |= insert_oid_pair(map->to_storage, compat_oid, oid);
+	if (inserted)
+		oidtree_insert(odb->loose_objects_cache, compat_oid);
+
+	return inserted;
+}
+
 static int load_one_loose_object_map(struct repository *repo, struct object_directory *dir)
 {
 	struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT;
@@ -49,15 +65,14 @@ static int load_one_loose_object_map(struct repository *repo, struct object_dire
 
 	if (!dir->loose_map)
 		loose_object_map_init(&dir->loose_map);
+	if (!dir->loose_objects_cache) {
+		ALLOC_ARRAY(dir->loose_objects_cache, 1);
+		oidtree_init(dir->loose_objects_cache);
+	}
 
-	insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->empty_tree, repo->compat_hash_algo->empty_tree);
-	insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->empty_tree, repo->hash_algo->empty_tree);
-
-	insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->empty_blob, repo->compat_hash_algo->empty_blob);
-	insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->empty_blob, repo->hash_algo->empty_blob);
-
-	insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->null_oid, repo->compat_hash_algo->null_oid);
-	insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->null_oid, repo->hash_algo->null_oid);
+	insert_loose_map(dir, repo->hash_algo->empty_tree, repo->compat_hash_algo->empty_tree);
+	insert_loose_map(dir, repo->hash_algo->empty_blob, repo->compat_hash_algo->empty_blob);
+	insert_loose_map(dir, repo->hash_algo->null_oid, repo->compat_hash_algo->null_oid);
 
 	strbuf_git_common_path(&path, repo, "objects/loose-object-idx");
 	fp = fopen(path.buf, "rb");
@@ -75,8 +90,7 @@ static int load_one_loose_object_map(struct repository *repo, struct object_dire
 		    parse_oid_hex_algop(p, &compat_oid, &p, repo->compat_hash_algo) ||
 		    p != buf.buf + buf.len)
 			goto err;
-		insert_oid_pair(dir->loose_map->to_compat, &oid, &compat_oid);
-		insert_oid_pair(dir->loose_map->to_storage, &compat_oid, &oid);
+		insert_loose_map(dir, &oid, &compat_oid);
 	}
 
 	strbuf_release(&buf);
@@ -195,8 +209,7 @@ int repo_add_loose_object_map(struct repository *repo, const struct object_id *o
 	if (!should_use_loose_object_map(repo))
 		return 0;
 
-	inserted |= insert_oid_pair(repo->objects->odb->loose_map->to_compat, oid, compat_oid);
-	inserted |= insert_oid_pair(repo->objects->odb->loose_map->to_storage, compat_oid, oid);
+	inserted = insert_loose_map(repo->objects->odb, oid, compat_oid);
 	if (inserted)
 		return write_one_object(repo, oid, compat_oid);
 	return 0;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH 07/30] object-file: Update the loose object map when writing loose objects
  2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
                   ` (5 preceding siblings ...)
  2023-09-27 19:55 ` [PATCH 06/30] loose: Compatibilty short name support Eric W. Biederman
@ 2023-09-27 19:55 ` Eric W. Biederman
  2023-09-27 19:55 ` [PATCH 08/30] object-file: Add a compat_oid_in parameter to write_object_file_flags Eric W. Biederman
                   ` (24 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-09-27 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

To implement SHA1 compatibility on SHA256 repositories the loose
object map needs to be updated whenver a loose object is written.
Updating the loose object map this way allows git to support
the old hash algorithm in constant time.

The functions write_loose_object, and stream_loose_object are
the only two functions that write to the loose object store.

Update stream_loose_object to compute the compatibiilty hash, update
the loose object, and then call repo_add_loose_object_map to update
the loose object map.

Update write_object_file_flags to convert the object into
it's compatibility encoding, hash the compatibility encoding,
write the object, and then update the loose object map.

Update force_object_loose to lookup the hash of the compatibility
encoding, write the loose object, and then update the loose object
map.

Update write_object_file_literally to convert the object into it's
compatibility hash encoding, hash the compatibility enconding, write
the object, and then update the loose object map, when the type string
is a known type.  For objects with an unknown type this results in a
partially broken repository, as the objects are not mapped.

The point of write_object_file_literally is to generate a partially
broken repository for testing.  For testing skipping writing the loose
object map is much more useful than refusing to write the broken
object at all.

Except that the loose objects are updated before the loose object map
I have not done any analysis to see how robust this scheme is in the
event of failure.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 object-file.c | 113 ++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 95 insertions(+), 18 deletions(-)

diff --git a/object-file.c b/object-file.c
index 7c7afe579364..4ad31f25c555 100644
--- a/object-file.c
+++ b/object-file.c
@@ -43,6 +43,8 @@
 #include "setup.h"
 #include "submodule.h"
 #include "fsck.h"
+#include "loose.h"
+#include "object-file-convert.h"
 
 /* The maximum size for an object header. */
 #define MAX_HEADER_LEN 32
@@ -1952,9 +1954,12 @@ static int start_loose_object_common(struct strbuf *tmp_file,
 				     const char *filename, unsigned flags,
 				     git_zstream *stream,
 				     unsigned char *buf, size_t buflen,
-				     git_hash_ctx *c,
+				     git_hash_ctx *c, git_hash_ctx *compat_c,
 				     char *hdr, int hdrlen)
 {
+	struct repository *repo = the_repository;
+	const struct git_hash_algo *algo = repo->hash_algo;
+	const struct git_hash_algo *compat = repo->compat_hash_algo;
 	int fd;
 
 	fd = create_tmpfile(tmp_file, filename);
@@ -1974,14 +1979,18 @@ static int start_loose_object_common(struct strbuf *tmp_file,
 	git_deflate_init(stream, zlib_compression_level);
 	stream->next_out = buf;
 	stream->avail_out = buflen;
-	the_hash_algo->init_fn(c);
+	algo->init_fn(c);
+	if (compat && compat_c)
+		compat->init_fn(compat_c);
 
 	/*  Start to feed header to zlib stream */
 	stream->next_in = (unsigned char *)hdr;
 	stream->avail_in = hdrlen;
 	while (git_deflate(stream, 0) == Z_OK)
 		; /* nothing */
-	the_hash_algo->update_fn(c, hdr, hdrlen);
+	algo->update_fn(c, hdr, hdrlen);
+	if (compat && compat_c)
+		compat->update_fn(compat_c, hdr, hdrlen);
 
 	return fd;
 }
@@ -1990,16 +1999,21 @@ static int start_loose_object_common(struct strbuf *tmp_file,
  * Common steps for the inner git_deflate() loop for writing loose
  * objects. Returns what git_deflate() returns.
  */
-static int write_loose_object_common(git_hash_ctx *c,
+static int write_loose_object_common(git_hash_ctx *c, git_hash_ctx *compat_c,
 				     git_zstream *stream, const int flush,
 				     unsigned char *in0, const int fd,
 				     unsigned char *compressed,
 				     const size_t compressed_len)
 {
+	struct repository *repo = the_repository;
+	const struct git_hash_algo *algo = repo->hash_algo;
+	const struct git_hash_algo *compat = repo->compat_hash_algo;
 	int ret;
 
 	ret = git_deflate(stream, flush ? Z_FINISH : 0);
-	the_hash_algo->update_fn(c, in0, stream->next_in - in0);
+	algo->update_fn(c, in0, stream->next_in - in0);
+	if (compat && compat_c)
+		compat->update_fn(compat_c, in0, stream->next_in - in0);
 	if (write_in_full(fd, compressed, stream->next_out - compressed) < 0)
 		die_errno(_("unable to write loose object file"));
 	stream->next_out = compressed;
@@ -2014,15 +2028,21 @@ static int write_loose_object_common(git_hash_ctx *c,
  * - End the compression of zlib stream.
  * - Get the calculated oid to "oid".
  */
-static int end_loose_object_common(git_hash_ctx *c, git_zstream *stream,
-				   struct object_id *oid)
+static int end_loose_object_common(git_hash_ctx *c, git_hash_ctx *compat_c,
+				   git_zstream *stream, struct object_id *oid,
+				   struct object_id *compat_oid)
 {
+	struct repository *repo = the_repository;
+	const struct git_hash_algo *algo = repo->hash_algo;
+	const struct git_hash_algo *compat = repo->compat_hash_algo;
 	int ret;
 
 	ret = git_deflate_end_gently(stream);
 	if (ret != Z_OK)
 		return ret;
-	the_hash_algo->final_oid_fn(oid, c);
+	algo->final_oid_fn(oid, c);
+	if (compat && compat_c)
+		compat->final_oid_fn(compat_oid, compat_c);
 
 	return Z_OK;
 }
@@ -2046,7 +2066,7 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
 
 	fd = start_loose_object_common(&tmp_file, filename.buf, flags,
 				       &stream, compressed, sizeof(compressed),
-				       &c, hdr, hdrlen);
+				       &c, NULL, hdr, hdrlen);
 	if (fd < 0)
 		return -1;
 
@@ -2056,14 +2076,14 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
 	do {
 		unsigned char *in0 = stream.next_in;
 
-		ret = write_loose_object_common(&c, &stream, 1, in0, fd,
+		ret = write_loose_object_common(&c, NULL, &stream, 1, in0, fd,
 						compressed, sizeof(compressed));
 	} while (ret == Z_OK);
 
 	if (ret != Z_STREAM_END)
 		die(_("unable to deflate new object %s (%d)"), oid_to_hex(oid),
 		    ret);
-	ret = end_loose_object_common(&c, &stream, &parano_oid);
+	ret = end_loose_object_common(&c, NULL, &stream, &parano_oid, NULL);
 	if (ret != Z_OK)
 		die(_("deflateEnd on object %s failed (%d)"), oid_to_hex(oid),
 		    ret);
@@ -2108,10 +2128,12 @@ static int freshen_packed_object(const struct object_id *oid)
 int stream_loose_object(struct input_stream *in_stream, size_t len,
 			struct object_id *oid)
 {
+	const struct git_hash_algo *compat = the_repository->compat_hash_algo;
+	struct object_id compat_oid;
 	int fd, ret, err = 0, flush = 0;
 	unsigned char compressed[4096];
 	git_zstream stream;
-	git_hash_ctx c;
+	git_hash_ctx c, compat_c;
 	struct strbuf tmp_file = STRBUF_INIT;
 	struct strbuf filename = STRBUF_INIT;
 	int dirlen;
@@ -2135,7 +2157,7 @@ int stream_loose_object(struct input_stream *in_stream, size_t len,
 	 */
 	fd = start_loose_object_common(&tmp_file, filename.buf, 0,
 				       &stream, compressed, sizeof(compressed),
-				       &c, hdr, hdrlen);
+				       &c, &compat_c, hdr, hdrlen);
 	if (fd < 0) {
 		err = -1;
 		goto cleanup;
@@ -2153,7 +2175,7 @@ int stream_loose_object(struct input_stream *in_stream, size_t len,
 			if (in_stream->is_finished)
 				flush = 1;
 		}
-		ret = write_loose_object_common(&c, &stream, flush, in0, fd,
+		ret = write_loose_object_common(&c, &compat_c, &stream, flush, in0, fd,
 						compressed, sizeof(compressed));
 		/*
 		 * Unlike write_loose_object(), we do not have the entire
@@ -2176,7 +2198,7 @@ int stream_loose_object(struct input_stream *in_stream, size_t len,
 	 */
 	if (ret != Z_STREAM_END)
 		die(_("unable to stream deflate new object (%d)"), ret);
-	ret = end_loose_object_common(&c, &stream, oid);
+	ret = end_loose_object_common(&c, &compat_c, &stream, oid, &compat_oid);
 	if (ret != Z_OK)
 		die(_("deflateEnd on stream object failed (%d)"), ret);
 	close_loose_object(fd, tmp_file.buf);
@@ -2203,6 +2225,8 @@ int stream_loose_object(struct input_stream *in_stream, size_t len,
 	}
 
 	err = finalize_object_file(tmp_file.buf, filename.buf);
+	if (!err && compat)
+		err = repo_add_loose_object_map(the_repository, oid, &compat_oid);
 cleanup:
 	strbuf_release(&tmp_file);
 	strbuf_release(&filename);
@@ -2213,17 +2237,38 @@ int write_object_file_flags(const void *buf, unsigned long len,
 			    enum object_type type, struct object_id *oid,
 			    unsigned flags)
 {
+	struct repository *repo = the_repository;
+	const struct git_hash_algo *algo = repo->hash_algo;
+	const struct git_hash_algo *compat = repo->compat_hash_algo;
+	struct object_id compat_oid;
 	char hdr[MAX_HEADER_LEN];
 	int hdrlen = sizeof(hdr);
 
+	/* Generate compat_oid */
+	if (compat) {
+		if (type == OBJ_BLOB)
+			hash_object_file(compat, buf, len, type, &compat_oid);
+		else {
+			struct strbuf converted = STRBUF_INIT;
+			convert_object_file(&converted, algo, compat,
+					    buf, len, type, 0);
+			hash_object_file(compat, converted.buf, converted.len,
+					 type, &compat_oid);
+			strbuf_release(&converted);
+		}
+	}
+
 	/* Normally if we have it in the pack then we do not bother writing
 	 * it out into .git/objects/??/?{38} file.
 	 */
-	write_object_file_prepare(the_hash_algo, buf, len, type, oid, hdr,
-				  &hdrlen);
+	write_object_file_prepare(algo, buf, len, type, oid, hdr, &hdrlen);
 	if (freshen_packed_object(oid) || freshen_loose_object(oid))
 		return 0;
-	return write_loose_object(oid, hdr, hdrlen, buf, len, 0, flags);
+	if (write_loose_object(oid, hdr, hdrlen, buf, len, 0, flags))
+		return -1;
+	if (compat)
+		return repo_add_loose_object_map(repo, oid, &compat_oid);
+	return 0;
 }
 
 int write_object_file_literally(const void *buf, unsigned long len,
@@ -2231,7 +2276,27 @@ int write_object_file_literally(const void *buf, unsigned long len,
 				unsigned flags)
 {
 	char *header;
+	struct repository *repo = the_repository;
+	const struct git_hash_algo *algo = repo->hash_algo;
+	const struct git_hash_algo *compat = repo->compat_hash_algo;
+	struct object_id compat_oid;
 	int hdrlen, status = 0;
+	int compat_type = -1;
+
+	if (compat) {
+		compat_type = type_from_string_gently(type, -1, 1);
+		if (compat_type == OBJ_BLOB)
+			hash_object_file(compat, buf, len, compat_type,
+					 &compat_oid);
+		else if (compat_type != -1) {
+			struct strbuf converted = STRBUF_INIT;
+			convert_object_file(&converted, algo, compat,
+					    buf, len, compat_type, 0);
+			hash_object_file(compat, converted.buf, converted.len,
+					 compat_type, &compat_oid);
+			strbuf_release(&converted);
+		}
+	}
 
 	/* type string, SP, %lu of the length plus NUL must fit this */
 	hdrlen = strlen(type) + MAX_HEADER_LEN;
@@ -2244,6 +2309,8 @@ int write_object_file_literally(const void *buf, unsigned long len,
 	if (freshen_packed_object(oid) || freshen_loose_object(oid))
 		goto cleanup;
 	status = write_loose_object(oid, header, hdrlen, buf, len, 0, 0);
+	if (compat_type != -1)
+		return repo_add_loose_object_map(repo, oid, &compat_oid);
 
 cleanup:
 	free(header);
@@ -2252,9 +2319,12 @@ int write_object_file_literally(const void *buf, unsigned long len,
 
 int force_object_loose(const struct object_id *oid, time_t mtime)
 {
+	struct repository *repo = the_repository;
+	const struct git_hash_algo *compat = repo->compat_hash_algo;
 	void *buf;
 	unsigned long len;
 	struct object_info oi = OBJECT_INFO_INIT;
+	struct object_id compat_oid;
 	enum object_type type;
 	char hdr[MAX_HEADER_LEN];
 	int hdrlen;
@@ -2267,8 +2337,15 @@ int force_object_loose(const struct object_id *oid, time_t mtime)
 	oi.contentp = &buf;
 	if (oid_object_info_extended(the_repository, oid, &oi, 0))
 		return error(_("cannot read object for %s"), oid_to_hex(oid));
+	if (compat) {
+		if (repo_oid_to_algop(repo, oid, compat, &compat_oid))
+			return error(_("cannot map object %s to %s"),
+				     oid_to_hex(oid), compat->name);
+	}
 	hdrlen = format_object_header(hdr, sizeof(hdr), type, len);
 	ret = write_loose_object(oid, hdr, hdrlen, buf, len, mtime, 0);
+	if (!ret && compat)
+		ret = repo_add_loose_object_map(the_repository, oid, &compat_oid);
 	free(buf);
 
 	return ret;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH 08/30] object-file: Add a compat_oid_in parameter to write_object_file_flags
  2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
                   ` (6 preceding siblings ...)
  2023-09-27 19:55 ` [PATCH 07/30] object-file: Update the loose object map when writing loose objects Eric W. Biederman
@ 2023-09-27 19:55 ` Eric W. Biederman
  2023-09-27 19:55 ` [PATCH 09/30] commit: write commits for both hashes Eric W. Biederman
                   ` (23 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-09-27 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

To create the proper signatures for commit objects both versions of
the commit object need to be generated and signed.  After that it is
a waste to throw away the work of generating the compatibility hash
so update write_object_file_flags to take a compatibility hash input
parameter that it can use to skip the work of generating the
compatability hash.

Update the places that don't generate the compatability hash to
pass NULL so it is easy to tell write_object_file_flags should
not attempt to use their compatability hash.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 cache-tree.c      | 2 +-
 object-file.c     | 6 ++++--
 object-store-ll.h | 4 ++--
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/cache-tree.c b/cache-tree.c
index 641427ed410a..ddc7d3d86959 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -448,7 +448,7 @@ static int update_one(struct cache_tree *it,
 		hash_object_file(the_hash_algo, buffer.buf, buffer.len,
 				 OBJ_TREE, &it->oid);
 	} else if (write_object_file_flags(buffer.buf, buffer.len, OBJ_TREE,
-					   &it->oid, flags & WRITE_TREE_SILENT
+					   &it->oid, NULL, flags & WRITE_TREE_SILENT
 					   ? HASH_SILENT : 0)) {
 		strbuf_release(&buffer);
 		return -1;
diff --git a/object-file.c b/object-file.c
index 4ad31f25c555..d66d11890696 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2235,7 +2235,7 @@ int stream_loose_object(struct input_stream *in_stream, size_t len,
 
 int write_object_file_flags(const void *buf, unsigned long len,
 			    enum object_type type, struct object_id *oid,
-			    unsigned flags)
+			    struct object_id *compat_oid_in, unsigned flags)
 {
 	struct repository *repo = the_repository;
 	const struct git_hash_algo *algo = repo->hash_algo;
@@ -2246,7 +2246,9 @@ int write_object_file_flags(const void *buf, unsigned long len,
 
 	/* Generate compat_oid */
 	if (compat) {
-		if (type == OBJ_BLOB)
+		if (compat_oid_in)
+			oidcpy(&compat_oid, compat_oid_in);
+		else if (type == OBJ_BLOB)
 			hash_object_file(compat, buf, len, type, &compat_oid);
 		else {
 			struct strbuf converted = STRBUF_INIT;
diff --git a/object-store-ll.h b/object-store-ll.h
index bc76d6bec80d..c5f2bb2fc2fe 100644
--- a/object-store-ll.h
+++ b/object-store-ll.h
@@ -255,11 +255,11 @@ void hash_object_file(const struct git_hash_algo *algo, const void *buf,
 
 int write_object_file_flags(const void *buf, unsigned long len,
 			    enum object_type type, struct object_id *oid,
-			    unsigned flags);
+			    struct object_id *comapt_oid_in, unsigned flags);
 static inline int write_object_file(const void *buf, unsigned long len,
 				    enum object_type type, struct object_id *oid)
 {
-	return write_object_file_flags(buf, len, type, oid, 0);
+	return write_object_file_flags(buf, len, type, oid, NULL, 0);
 }
 
 int write_object_file_literally(const void *buf, unsigned long len,
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH 09/30] commit: write commits for both hashes
  2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
                   ` (7 preceding siblings ...)
  2023-09-27 19:55 ` [PATCH 08/30] object-file: Add a compat_oid_in parameter to write_object_file_flags Eric W. Biederman
@ 2023-09-27 19:55 ` Eric W. Biederman
  2023-09-27 19:55 ` [PATCH 10/30] commit: Convert mergetag before computing the signature of a commit Eric W. Biederman
                   ` (22 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-09-27 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W . Biederman

From: "brian m. carlson" <sandals@crustytoothpaste.net>

When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree.  In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.

However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data.  While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.

Consequently, we don't want to use the standard mapping that occurs when
we write an object.  Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.

We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents.  If we're signing
the commit, we then sign both versions and append both signatures to
both buffers.  To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.

In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 commit.c | 179 +++++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 134 insertions(+), 45 deletions(-)

diff --git a/commit.c b/commit.c
index b3223478bc2a..46696ede8981 100644
--- a/commit.c
+++ b/commit.c
@@ -28,6 +28,7 @@
 #include "shallow.h"
 #include "tree.h"
 #include "hook.h"
+#include "object-file-convert.h"
 
 static struct commit_extra_header *read_commit_extra_header_lines(const char *buf, size_t len, const char **);
 
@@ -1100,12 +1101,11 @@ static const char *gpg_sig_headers[] = {
 	"gpgsig-sha256",
 };
 
-int sign_with_header(struct strbuf *buf, const char *keyid)
+static int add_commit_signature(struct strbuf *buf, struct strbuf *sig, const struct git_hash_algo *algo)
 {
-	struct strbuf sig = STRBUF_INIT;
 	int inspos, copypos;
 	const char *eoh;
-	const char *gpg_sig_header = gpg_sig_headers[hash_algo_by_ptr(the_hash_algo)];
+	const char *gpg_sig_header = gpg_sig_headers[hash_algo_by_ptr(algo)];
 	int gpg_sig_header_len = strlen(gpg_sig_header);
 
 	/* find the end of the header */
@@ -1115,15 +1115,8 @@ int sign_with_header(struct strbuf *buf, const char *keyid)
 	else
 		inspos = eoh - buf->buf + 1;
 
-	if (!keyid || !*keyid)
-		keyid = get_signing_key();
-	if (sign_buffer(buf, &sig, keyid)) {
-		strbuf_release(&sig);
-		return -1;
-	}
-
-	for (copypos = 0; sig.buf[copypos]; ) {
-		const char *bol = sig.buf + copypos;
+	for (copypos = 0; sig->buf[copypos]; ) {
+		const char *bol = sig->buf + copypos;
 		const char *eol = strchrnul(bol, '\n');
 		int len = (eol - bol) + !!*eol;
 
@@ -1136,11 +1129,17 @@ int sign_with_header(struct strbuf *buf, const char *keyid)
 		inspos += len;
 		copypos += len;
 	}
-	strbuf_release(&sig);
 	return 0;
 }
 
-
+static int sign_commit_to_strbuf(struct strbuf *sig, struct strbuf *buf, const char *keyid)
+{
+	if (!keyid || !*keyid)
+		keyid = get_signing_key();
+	if (sign_buffer(buf, sig, keyid))
+		return -1;
+	return 0;
+}
 
 int parse_signed_commit(const struct commit *commit,
 			struct strbuf *payload, struct strbuf *signature,
@@ -1599,70 +1598,160 @@ N_("Warning: commit message did not conform to UTF-8.\n"
    "You may want to amend it after fixing the message, or set the config\n"
    "variable i18n.commitEncoding to the encoding your project uses.\n");
 
-int commit_tree_extended(const char *msg, size_t msg_len,
-			 const struct object_id *tree,
-			 struct commit_list *parents, struct object_id *ret,
-			 const char *author, const char *committer,
-			 const char *sign_commit,
-			 struct commit_extra_header *extra)
+static void write_commit_tree(struct strbuf *buffer, const char *msg, size_t msg_len,
+			      const struct object_id *tree,
+			      const struct object_id *parents, size_t parents_len,
+			      const char *author, const char *committer,
+			      struct commit_extra_header *extra)
 {
-	int result;
 	int encoding_is_utf8;
-	struct strbuf buffer;
-
-	assert_oid_type(tree, OBJ_TREE);
-
-	if (memchr(msg, '\0', msg_len))
-		return error("a NUL byte in commit log message not allowed.");
+	size_t i;
 
 	/* Not having i18n.commitencoding is the same as having utf-8 */
 	encoding_is_utf8 = is_encoding_utf8(git_commit_encoding);
 
-	strbuf_init(&buffer, 8192); /* should avoid reallocs for the headers */
-	strbuf_addf(&buffer, "tree %s\n", oid_to_hex(tree));
+	strbuf_init(buffer, 8192); /* should avoid reallocs for the headers */
+	strbuf_addf(buffer, "tree %s\n", oid_to_hex(tree));
 
 	/*
 	 * NOTE! This ordering means that the same exact tree merged with a
 	 * different order of parents will be a _different_ changeset even
 	 * if everything else stays the same.
 	 */
-	while (parents) {
-		struct commit *parent = pop_commit(&parents);
-		strbuf_addf(&buffer, "parent %s\n",
-			    oid_to_hex(&parent->object.oid));
-	}
+	for (i = 0; i < parents_len; i++)
+		strbuf_addf(buffer, "parent %s\n", oid_to_hex(&parents[i]));
 
 	/* Person/date information */
 	if (!author)
 		author = git_author_info(IDENT_STRICT);
-	strbuf_addf(&buffer, "author %s\n", author);
+	strbuf_addf(buffer, "author %s\n", author);
 	if (!committer)
 		committer = git_committer_info(IDENT_STRICT);
-	strbuf_addf(&buffer, "committer %s\n", committer);
+	strbuf_addf(buffer, "committer %s\n", committer);
 	if (!encoding_is_utf8)
-		strbuf_addf(&buffer, "encoding %s\n", git_commit_encoding);
+		strbuf_addf(buffer, "encoding %s\n", git_commit_encoding);
 
 	while (extra) {
-		add_extra_header(&buffer, extra);
+		add_extra_header(buffer, extra);
 		extra = extra->next;
 	}
-	strbuf_addch(&buffer, '\n');
+	strbuf_addch(buffer, '\n');
 
 	/* And add the comment */
-	strbuf_add(&buffer, msg, msg_len);
+	strbuf_add(buffer, msg, msg_len);
+}
 
-	/* And check the encoding */
-	if (encoding_is_utf8 && !verify_utf8(&buffer))
-		fprintf(stderr, _(commit_utf8_warn));
+int commit_tree_extended(const char *msg, size_t msg_len,
+			 const struct object_id *tree,
+			 struct commit_list *parents, struct object_id *ret,
+			 const char *author, const char *committer,
+			 const char *sign_commit,
+			 struct commit_extra_header *extra)
+{
+	struct repository *r = the_repository;
+	int result = 0;
+	int encoding_is_utf8;
+	struct strbuf buffer, compat_buffer;
+	struct strbuf sig = STRBUF_INIT, compat_sig = STRBUF_INIT;
+	struct object_id *parent_buf = NULL, *compat_oid = NULL;
+	struct object_id compat_oid_buf;
+	size_t i, nparents;
+
+	/* Not having i18n.commitencoding is the same as having utf-8 */
+	encoding_is_utf8 = is_encoding_utf8(git_commit_encoding);
+
+	assert_oid_type(tree, OBJ_TREE);
+
+	if (memchr(msg, '\0', msg_len))
+		return error("a NUL byte in commit log message not allowed.");
+
+	nparents = commit_list_count(parents);
+	parent_buf = xcalloc(nparents, sizeof(*parent_buf));
+	for (i = 0; i < nparents; i++) {
+		struct commit *parent = pop_commit(&parents);
+		oidcpy(&parent_buf[i], &parent->object.oid);
+	}
+
+	/* should avoid reallocs for the headers */
+	strbuf_init(&buffer, 8192);
+	strbuf_init(&compat_buffer, 8192);
 
-	if (sign_commit && sign_with_header(&buffer, sign_commit)) {
+	write_commit_tree(&buffer, msg, msg_len, tree, parent_buf, nparents, author, committer, extra);
+	if (sign_commit && sign_commit_to_strbuf(&sig, &buffer, sign_commit)) {
 		result = -1;
 		goto out;
 	}
+	if (r->compat_hash_algo) {
+		struct object_id mapped_tree;
+		struct object_id *mapped_parents = xcalloc(nparents, sizeof(*mapped_parents));
+		if (repo_oid_to_algop(r, tree, r->compat_hash_algo, &mapped_tree)) {
+			result = -1;
+			free(mapped_parents);
+			goto out;
+		}
+		for (i = 0; i < nparents; i++)
+			if (repo_oid_to_algop(r, &parent_buf[i], r->compat_hash_algo, &mapped_parents[i])) {
+				result = -1;
+				free(mapped_parents);
+				goto out;
+			}
+		write_commit_tree(&compat_buffer, msg, msg_len, &mapped_tree,
+				  mapped_parents, nparents, author, committer, extra);
 
-	result = write_object_file(buffer.buf, buffer.len, OBJ_COMMIT, ret);
+		if (sign_commit && sign_commit_to_strbuf(&compat_sig, &compat_buffer, sign_commit)) {
+			result = -1;
+			goto out;
+		}
+	}
+
+	if (sign_commit) {
+		struct sig_pairs {
+			struct strbuf *sig;
+			const struct git_hash_algo *algo;
+		} bufs [2] = {
+			{ &compat_sig, r->compat_hash_algo },
+			{ &sig, r->hash_algo },
+		};
+		int i;
+
+		/*
+		 * We write algorithms in the order they were implemented in
+		 * Git to produce a stable hash when multiple algorithms are
+		 * used.
+		 */
+		if (r->compat_hash_algo && hash_algo_by_ptr(bufs[0].algo) > hash_algo_by_ptr(bufs[1].algo))
+			SWAP(bufs[0], bufs[1]);
+
+		/*
+		 * We traverse each algorithm in order, and apply the signature
+		 * to each buffer.
+		 */
+		for (i = 0; i < ARRAY_SIZE(bufs); i++) {
+			if (!bufs[i].algo)
+				continue;
+			add_commit_signature(&buffer, bufs[i].sig, bufs[i].algo);
+			if (r->compat_hash_algo)
+				add_commit_signature(&compat_buffer, bufs[i].sig, bufs[i].algo);
+		}
+	}
+
+	/* And check the encoding. */
+	if (encoding_is_utf8 && (!verify_utf8(&buffer) || !verify_utf8(&compat_buffer)))
+		fprintf(stderr, _(commit_utf8_warn));
+
+	if (r->compat_hash_algo) {
+		hash_object_file(r->compat_hash_algo, compat_buffer.buf, compat_buffer.len,
+			OBJ_COMMIT, &compat_oid_buf);
+		compat_oid = &compat_oid_buf;
+	}
+
+	result = write_object_file_flags(buffer.buf, buffer.len, OBJ_COMMIT,
+					 ret, compat_oid, 0);
 out:
 	strbuf_release(&buffer);
+	strbuf_release(&compat_buffer);
+	strbuf_release(&sig);
+	strbuf_release(&compat_sig);
 	return result;
 }
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH 10/30] commit: Convert mergetag before computing the signature of a commit
  2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
                   ` (8 preceding siblings ...)
  2023-09-27 19:55 ` [PATCH 09/30] commit: write commits for both hashes Eric W. Biederman
@ 2023-09-27 19:55 ` Eric W. Biederman
  2023-09-27 19:55 ` [PATCH 11/30] commit: Export add_header_signature to support handling signatures on tags Eric W. Biederman
                   ` (21 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-09-27 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

It so happens that commit mergetag lines embed a tag object.  So to
compute the compatible signature of a commit object that has mergetag
lines the compatible embedded tag must be computed first.

Implement this by duplicating and converting the commit extra headers
into the compatible version of the commit extra headers, that need
to be passed to commit_tree_extended.

To handle merge tags only the compatible extra headers need to be
computed.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 commit.c | 42 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/commit.c b/commit.c
index 46696ede8981..a6dac9a1957b 100644
--- a/commit.c
+++ b/commit.c
@@ -1355,6 +1355,39 @@ void append_merge_tag_headers(struct commit_list *parents,
 	}
 }
 
+static int convert_commit_extra_headers(struct commit_extra_header *orig,
+					struct commit_extra_header **result)
+{
+	const struct git_hash_algo *compat = the_repository->compat_hash_algo;
+	const struct git_hash_algo *algo = the_repository->hash_algo;
+	struct commit_extra_header *extra = NULL, **tail = &extra;
+	struct strbuf out = STRBUF_INIT;
+	while (orig) {
+		struct commit_extra_header *new;
+		CALLOC_ARRAY(new, 1);
+		if (!strcmp(orig->key, "mergetag")) {
+			if (convert_object_file(&out, algo, compat,
+						orig->value, orig->len,
+						OBJ_TAG, 1)) {
+				free(new);
+				free_commit_extra_headers(extra);
+				return -1;
+			}
+			new->key = xstrdup("mergetag");
+			new->value = strbuf_detach(&out, &new->len);
+		} else {
+			new->key = xstrdup(orig->key);
+			new->len = orig->len;
+			new->value = xmemdupz(orig->value, orig->len);
+		}
+		*tail = new;
+		tail = &new->next;
+		orig = orig->next;
+	}
+	*result = extra;
+	return 0;
+}
+
 static void add_extra_header(struct strbuf *buffer,
 			     struct commit_extra_header *extra)
 {
@@ -1682,6 +1715,7 @@ int commit_tree_extended(const char *msg, size_t msg_len,
 		goto out;
 	}
 	if (r->compat_hash_algo) {
+		struct commit_extra_header *compat_extra = NULL;
 		struct object_id mapped_tree;
 		struct object_id *mapped_parents = xcalloc(nparents, sizeof(*mapped_parents));
 		if (repo_oid_to_algop(r, tree, r->compat_hash_algo, &mapped_tree)) {
@@ -1695,8 +1729,14 @@ int commit_tree_extended(const char *msg, size_t msg_len,
 				free(mapped_parents);
 				goto out;
 			}
+		if (convert_commit_extra_headers(extra, &compat_extra)) {
+			result = -1;
+			free(mapped_parents);
+			goto out;
+		}
 		write_commit_tree(&compat_buffer, msg, msg_len, &mapped_tree,
-				  mapped_parents, nparents, author, committer, extra);
+				  mapped_parents, nparents, author, committer, compat_extra);
+		free_commit_extra_headers(compat_extra);
 
 		if (sign_commit && sign_commit_to_strbuf(&compat_sig, &compat_buffer, sign_commit)) {
 			result = -1;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH 11/30] commit: Export add_header_signature to support handling signatures on tags
  2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
                   ` (9 preceding siblings ...)
  2023-09-27 19:55 ` [PATCH 10/30] commit: Convert mergetag before computing the signature of a commit Eric W. Biederman
@ 2023-09-27 19:55 ` Eric W. Biederman
  2023-09-27 19:55 ` [PATCH 12/30] tag: sign both hashes Eric W. Biederman
                   ` (20 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-09-27 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

Rename add_commit_signature as add_header_signature, and expose it so
that it can be used for converting tags from one object format to
another.

Inspired-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 commit.c | 6 +++---
 commit.h | 1 +
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/commit.c b/commit.c
index a6dac9a1957b..e75270171bc3 100644
--- a/commit.c
+++ b/commit.c
@@ -1101,7 +1101,7 @@ static const char *gpg_sig_headers[] = {
 	"gpgsig-sha256",
 };
 
-static int add_commit_signature(struct strbuf *buf, struct strbuf *sig, const struct git_hash_algo *algo)
+int add_header_signature(struct strbuf *buf, struct strbuf *sig, const struct git_hash_algo *algo)
 {
 	int inspos, copypos;
 	const char *eoh;
@@ -1769,9 +1769,9 @@ int commit_tree_extended(const char *msg, size_t msg_len,
 		for (i = 0; i < ARRAY_SIZE(bufs); i++) {
 			if (!bufs[i].algo)
 				continue;
-			add_commit_signature(&buffer, bufs[i].sig, bufs[i].algo);
+			add_header_signature(&buffer, bufs[i].sig, bufs[i].algo);
 			if (r->compat_hash_algo)
-				add_commit_signature(&compat_buffer, bufs[i].sig, bufs[i].algo);
+				add_header_signature(&compat_buffer, bufs[i].sig, bufs[i].algo);
 		}
 	}
 
diff --git a/commit.h b/commit.h
index 28928833c544..03edcec0129f 100644
--- a/commit.h
+++ b/commit.h
@@ -370,5 +370,6 @@ int parse_buffer_signed_by_header(const char *buffer,
 				  struct strbuf *payload,
 				  struct strbuf *signature,
 				  const struct git_hash_algo *algop);
+int add_header_signature(struct strbuf *buf, struct strbuf *sig, const struct git_hash_algo *algo);
 
 #endif /* COMMIT_H */
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH 12/30] tag: sign both hashes
  2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
                   ` (10 preceding siblings ...)
  2023-09-27 19:55 ` [PATCH 11/30] commit: Export add_header_signature to support handling signatures on tags Eric W. Biederman
@ 2023-09-27 19:55 ` Eric W. Biederman
  2023-09-27 19:55 ` [PATCH 13/30] cache: add a function to read an OID of a specific algorithm Eric W. Biederman
                   ` (19 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-09-27 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

When we write a tag the object oid is specific to the hash algorithm.

This matters when a tag is signed.  The hash transition plan calls for
signatures on both the sha1 form and the sha256 form of the object,
and for both of those signatures to live in the tag object.

To generate tag object with multiple signatures, first compute the
unsigned form of the tag, and then if the tag is being signed compute
the unsigned form of the tag with the compatibilityr hash.  Then
compute compute the signatures of both buffers.

Once the signatures are computed add them to both buffers.  This
allows computing the compatibility hash in do_sign, saving
write_object_file the expense of recomputing the compatibility tag
just to compute it's hash.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 builtin/tag.c | 45 +++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 41 insertions(+), 4 deletions(-)

diff --git a/builtin/tag.c b/builtin/tag.c
index 3918eacbb57b..8c4bc28952c2 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -28,6 +28,7 @@
 #include "ref-filter.h"
 #include "date.h"
 #include "write-or-die.h"
+#include "object-file-convert.h"
 
 static const char * const git_tag_usage[] = {
 	N_("git tag [-a | -s | -u <key-id>] [-f] [-m <msg> | -F <file>] [-e]\n"
@@ -174,9 +175,43 @@ static int verify_tag(const char *name, const char *ref UNUSED,
 	return 0;
 }
 
-static int do_sign(struct strbuf *buffer)
+static int do_sign(struct strbuf *buffer, struct object_id **compat_oid,
+		   struct object_id *compat_oid_buf)
 {
-	return sign_buffer(buffer, buffer, get_signing_key());
+	const struct git_hash_algo *compat = the_repository->compat_hash_algo;
+	struct strbuf sig = STRBUF_INIT, compat_sig = STRBUF_INIT;
+	struct strbuf compat_buf = STRBUF_INIT;
+	const char *keyid = get_signing_key();
+	int ret = -1;
+
+	if (sign_buffer(buffer, &sig, keyid))
+		return -1;
+
+	if (compat) {
+		const struct git_hash_algo *algo = the_repository->hash_algo;
+
+		if (convert_object_file(&compat_buf, algo, compat,
+					buffer->buf, buffer->len, OBJ_TAG, 1))
+			goto out;
+		if (sign_buffer(&compat_buf, &compat_sig, keyid))
+			goto out;
+		add_header_signature(&compat_buf, &sig, algo);
+		strbuf_addbuf(&compat_buf, &compat_sig);
+		hash_object_file(compat, compat_buf.buf, compat_buf.len,
+				 OBJ_TAG, compat_oid_buf);
+		*compat_oid = compat_oid_buf;
+	}
+
+	if (compat_sig.len)
+		add_header_signature(buffer, &compat_sig, compat);
+
+	strbuf_addbuf(buffer, &sig);
+	ret = 0;
+out:
+	strbuf_release(&sig);
+	strbuf_release(&compat_sig);
+	strbuf_release(&compat_buf);
+	return ret;
 }
 
 static const char tag_template[] =
@@ -249,9 +284,11 @@ static void write_tag_body(int fd, const struct object_id *oid)
 
 static int build_tag_object(struct strbuf *buf, int sign, struct object_id *result)
 {
-	if (sign && do_sign(buf) < 0)
+	struct object_id *compat_oid = NULL, compat_oid_buf;
+	if (sign && do_sign(buf, &compat_oid, &compat_oid_buf) < 0)
 		return error(_("unable to sign the tag"));
-	if (write_object_file(buf->buf, buf->len, OBJ_TAG, result) < 0)
+	if (write_object_file_flags(buf->buf, buf->len, OBJ_TAG, result,
+				    compat_oid, 0) < 0)
 		return error(_("unable to write tag file"));
 	return 0;
 }
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH 13/30] cache: add a function to read an OID of a specific algorithm
  2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
                   ` (11 preceding siblings ...)
  2023-09-27 19:55 ` [PATCH 12/30] tag: sign both hashes Eric W. Biederman
@ 2023-09-27 19:55 ` Eric W. Biederman
  2023-09-27 19:55 ` [PATCH 14/30] object: Factor out parse_mode out of fast-import and tree-walk into in object.h Eric W. Biederman
                   ` (18 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-09-27 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W. Biederman

From: "brian m. carlson" <sandals@crustytoothpaste.net>

Currently, we always read a object ID of the current algorithm with
oidread.  However, once we start converting objects, we'll need to
consider what happens when we want to read an object ID of a specific
algorithm, such as the compatibility algorithm.  To make this easier,
let's define oidread_algop, which specifies which algorithm we should
use for our object ID, and define oidread in terms of it.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 hash.h | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/hash.h b/hash.h
index 615ae0691d07..e064807c1733 100644
--- a/hash.h
+++ b/hash.h
@@ -73,10 +73,15 @@ static inline void oidclr(struct object_id *oid)
 	oid->algo = hash_algo_by_ptr(the_hash_algo);
 }
 
+static inline void oidread_algop(struct object_id *oid, const unsigned char *hash, const struct git_hash_algo *algop)
+{
+	memcpy(oid->hash, hash, algop->rawsz);
+	oid->algo = hash_algo_by_ptr(algop);
+}
+
 static inline void oidread(struct object_id *oid, const unsigned char *hash)
 {
-	memcpy(oid->hash, hash, the_hash_algo->rawsz);
-	oid->algo = hash_algo_by_ptr(the_hash_algo);
+	oidread_algop(oid, hash, the_hash_algo);
 }
 
 static inline int is_empty_blob_sha1(const unsigned char *sha1)
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH 14/30] object: Factor out parse_mode out of fast-import and tree-walk into in object.h
  2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
                   ` (12 preceding siblings ...)
  2023-09-27 19:55 ` [PATCH 13/30] cache: add a function to read an OID of a specific algorithm Eric W. Biederman
@ 2023-09-27 19:55 ` Eric W. Biederman
  2023-09-27 19:55 ` [PATCH 15/30] object-file-convert: add a function to convert trees between algorithms Eric W. Biederman
                   ` (17 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-09-27 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

builtin/fast-import.c and tree-walk.c have almost identical version of
get_mode.  The two functions started out the same but have diverged
slightly.  The version in fast-import changed mode to a uint16_t to
save memory.  The version in tree-walk started erroring if no mode was
present.

As far as I can tell both of these changes are valid for both of the
callers, so add the both changes and place the common parsing helper
in object.h

Rename the helper from get_mode to parse_mode so it does not
conflict with another helper named get_mode in diff-no-index.c

This will be used shortly in a new helper decode_tree_entry_raw
which is used to compute cmpatibility objects as part of
the sha256 transition.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 builtin/fast-import.c | 18 ++----------------
 object.h              | 18 ++++++++++++++++++
 tree-walk.c           | 22 +++-------------------
 3 files changed, 23 insertions(+), 35 deletions(-)

diff --git a/builtin/fast-import.c b/builtin/fast-import.c
index 4dbb10aff3da..2c645fcfbe3f 100644
--- a/builtin/fast-import.c
+++ b/builtin/fast-import.c
@@ -1235,20 +1235,6 @@ static void *gfi_unpack_entry(
 	return unpack_entry(the_repository, p, oe->idx.offset, &type, sizep);
 }
 
-static const char *get_mode(const char *str, uint16_t *modep)
-{
-	unsigned char c;
-	uint16_t mode = 0;
-
-	while ((c = *str++) != ' ') {
-		if (c < '0' || c > '7')
-			return NULL;
-		mode = (mode << 3) + (c - '0');
-	}
-	*modep = mode;
-	return str;
-}
-
 static void load_tree(struct tree_entry *root)
 {
 	struct object_id *oid = &root->versions[1].oid;
@@ -1286,7 +1272,7 @@ static void load_tree(struct tree_entry *root)
 		t->entries[t->entry_count++] = e;
 
 		e->tree = NULL;
-		c = get_mode(c, &e->versions[1].mode);
+		c = parse_mode(c, &e->versions[1].mode);
 		if (!c)
 			die("Corrupt mode in %s", oid_to_hex(oid));
 		e->versions[0].mode = e->versions[1].mode;
@@ -2275,7 +2261,7 @@ static void file_change_m(const char *p, struct branch *b)
 	struct object_id oid;
 	uint16_t mode, inline_data = 0;
 
-	p = get_mode(p, &mode);
+	p = parse_mode(p, &mode);
 	if (!p)
 		die("Corrupt mode: %s", command_buf.buf);
 	switch (mode) {
diff --git a/object.h b/object.h
index 114d45954d08..70c8d4ae63dc 100644
--- a/object.h
+++ b/object.h
@@ -190,6 +190,24 @@ void *create_object(struct repository *r, const struct object_id *oid, void *obj
 
 void *object_as_type(struct object *obj, enum object_type type, int quiet);
 
+
+static inline const char *parse_mode(const char *str, uint16_t *modep)
+{
+	unsigned char c;
+	unsigned int mode = 0;
+
+	if (*str == ' ')
+		return NULL;
+
+	while ((c = *str++) != ' ') {
+		if (c < '0' || c > '7')
+			return NULL;
+		mode = (mode << 3) + (c - '0');
+	}
+	*modep = mode;
+	return str;
+}
+
 /*
  * Returns the object, having parsed it to find out what it is.
  *
diff --git a/tree-walk.c b/tree-walk.c
index 29ead71be173..3af50a01c2c7 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -10,27 +10,11 @@
 #include "pathspec.h"
 #include "json-writer.h"
 
-static const char *get_mode(const char *str, unsigned int *modep)
-{
-	unsigned char c;
-	unsigned int mode = 0;
-
-	if (*str == ' ')
-		return NULL;
-
-	while ((c = *str++) != ' ') {
-		if (c < '0' || c > '7')
-			return NULL;
-		mode = (mode << 3) + (c - '0');
-	}
-	*modep = mode;
-	return str;
-}
-
 static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned long size, struct strbuf *err)
 {
 	const char *path;
-	unsigned int mode, len;
+	unsigned int len;
+	uint16_t mode;
 	const unsigned hashsz = the_hash_algo->rawsz;
 
 	if (size < hashsz + 3 || buf[size - (hashsz + 1)]) {
@@ -38,7 +22,7 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l
 		return -1;
 	}
 
-	path = get_mode(buf, &mode);
+	path = parse_mode(buf, &mode);
 	if (!path) {
 		strbuf_addstr(err, _("malformed mode in tree entry"));
 		return -1;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH 15/30] object-file-convert: add a function to convert trees between algorithms
  2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
                   ` (13 preceding siblings ...)
  2023-09-27 19:55 ` [PATCH 14/30] object: Factor out parse_mode out of fast-import and tree-walk into in object.h Eric W. Biederman
@ 2023-09-27 19:55 ` Eric W. Biederman
  2023-09-27 19:55 ` [PATCH 16/30] object-file-convert: convert tag objects when writing Eric W. Biederman
                   ` (16 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-09-27 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W . Biederman

From: "brian m. carlson" <sandals@crustytoothpaste.net>

In the future, we're going to want to provide SHA-256 repositories that
have compatibility support for SHA-1 as well.  In order to do so, we'll
need to be able to convert tree objects from SHA-256 to SHA-1 by writing
a tree with each SHA-256 object ID mapped to a SHA-1 object ID.

We implement a function, convert_tree_object, that takes an existing
tree buffer and writes it to a new strbuf, converting between
algorithms.  Let's make this function generic, because while we only
need it to convert from the main algorithm to the compatibility
algorithm now, we may need to do the other way around in the future,
such as for transport.

We avoid reusing the code in decode_tree_entry because that code
normalizes data, and we don't want that here.  We want to produce a
complete round trip of data, so if, for example, the old entry had a
wrongly zero-padded mode, we'd want to preserve that when converting to
ensure a stable hash value.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 object-file-convert.c | 51 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 50 insertions(+), 1 deletion(-)

diff --git a/object-file-convert.c b/object-file-convert.c
index 4d62ed192bf0..a9e7a208a707 100644
--- a/object-file-convert.c
+++ b/object-file-convert.c
@@ -1,8 +1,10 @@
 #include "git-compat-util.h"
 #include "gettext.h"
 #include "strbuf.h"
+#include "hex.h"
 #include "repository.h"
 #include "hash-ll.h"
+#include "hash.h"
 #include "object.h"
 #include "loose.h"
 #include "object-file-convert.h"
@@ -36,6 +38,51 @@ int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
 	return 0;
 }
 
+static int decode_tree_entry_raw(struct object_id *oid, const char **path,
+				 size_t *len, const struct git_hash_algo *algo,
+				 const char *buf, unsigned long size)
+{
+	uint16_t mode;
+	const unsigned hashsz = algo->rawsz;
+
+	if (size < hashsz + 3 || buf[size - (hashsz + 1)]) {
+		return -1;
+	}
+
+	*path = parse_mode(buf, &mode);
+	if (!*path || !**path)
+		return -1;
+	*len = strlen(*path) + 1;
+
+	oidread_algop(oid, (const unsigned char *)*path + *len, algo);
+	return 0;
+}
+
+static int convert_tree_object(struct strbuf *out,
+			       const struct git_hash_algo *from,
+			       const struct git_hash_algo *to,
+			       const char *buffer, size_t size)
+{
+	const char *p = buffer, *end = buffer + size;
+
+	while (p < end) {
+		struct object_id entry_oid, mapped_oid;
+		const char *path = NULL;
+		size_t pathlen;
+
+		if (decode_tree_entry_raw(&entry_oid, &path, &pathlen, from, p,
+					  end - p))
+			return error(_("failed to decode tree entry"));
+		if (repo_oid_to_algop(the_repository, &entry_oid, to, &mapped_oid))
+			return error(_("failed to map tree entry for %s"), oid_to_hex(&entry_oid));
+		strbuf_add(out, p, path - p);
+		strbuf_add(out, path, pathlen);
+		strbuf_add(out, mapped_oid.hash, to->rawsz);
+		p = path + pathlen + from->rawsz;
+	}
+	return 0;
+}
+
 int convert_object_file(struct strbuf *outbuf,
 			const struct git_hash_algo *from,
 			const struct git_hash_algo *to,
@@ -50,8 +97,10 @@ int convert_object_file(struct strbuf *outbuf,
 		die("Refusing noop object file conversion");
 
 	switch (type) {
-	case OBJ_COMMIT:
 	case OBJ_TREE:
+		ret = convert_tree_object(outbuf, from, to, buf, len);
+		break;
+	case OBJ_COMMIT:
 	case OBJ_TAG:
 	default:
 		/* Not implemented yet, so fail. */
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH 16/30] object-file-convert: convert tag objects when writing
  2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
                   ` (14 preceding siblings ...)
  2023-09-27 19:55 ` [PATCH 15/30] object-file-convert: add a function to convert trees between algorithms Eric W. Biederman
@ 2023-09-27 19:55 ` Eric W. Biederman
  2023-09-27 19:55 ` [PATCH 17/30] object-file-convert: Don't leak when converting tag objects Eric W. Biederman
                   ` (15 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-09-27 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W . Biederman

From: "brian m. carlson" <sandals@crustytoothpaste.net>

When writing a tag object in a repository with both SHA-1 and SHA-256,
we'll need to convert our commit objects so that we can write the hash
values for both into the repository.  To do so, let's add a function to
convert tag objects.

Note that signatures for tag objects in the current algorithm trail the
message, and those for the alternate algorithm are in headers.
Therefore, we parse the tag object for both a trailing signature and a
header and then, when writing the other format, swap the two around.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 object-file-convert.c | 52 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 51 insertions(+), 1 deletion(-)

diff --git a/object-file-convert.c b/object-file-convert.c
index a9e7a208a707..777ae5b58036 100644
--- a/object-file-convert.c
+++ b/object-file-convert.c
@@ -7,6 +7,8 @@
 #include "hash.h"
 #include "object.h"
 #include "loose.h"
+#include "commit.h"
+#include "gpg-interface.h"
 #include "object-file-convert.h"
 
 int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
@@ -83,6 +85,52 @@ static int convert_tree_object(struct strbuf *out,
 	return 0;
 }
 
+static int convert_tag_object(struct strbuf *out,
+			      const struct git_hash_algo *from,
+			      const struct git_hash_algo *to,
+			      const char *buffer, size_t size)
+{
+	struct strbuf payload = STRBUF_INIT, temp = STRBUF_INIT, oursig = STRBUF_INIT, othersig = STRBUF_INIT;
+	size_t payload_size;
+	struct object_id oid, mapped_oid;
+	const char *p;
+
+	/* Add some slop for longer signature header in the new algorithm. */
+	strbuf_grow(out, size + 7);
+
+	/* Is there a signature for our algorithm? */
+	payload_size = parse_signed_buffer(buffer, size);
+	strbuf_add(&payload, buffer, payload_size);
+	if (payload_size != size) {
+		/* Yes, there is. */
+		strbuf_add(&oursig, buffer + payload_size, size - payload_size);
+	}
+	/* Now, is there a signature for the other algorithm? */
+	if (parse_buffer_signed_by_header(payload.buf, payload.len, &temp, &othersig, to)) {
+		/* Yes, there is. */
+		strbuf_swap(&payload, &temp);
+		strbuf_release(&temp);
+	}
+
+	/*
+	 * Our payload is now in payload and we may have up to two signatrures
+	 * in oursig and othersig.
+	 */
+	if (strncmp(payload.buf, "object ", 7) || payload.buf[from->hexsz + 7] != '\n')
+		return error("bogus tag object");
+	if (parse_oid_hex_algop(payload.buf + 7, &oid, &p, from) < 0)
+		return error("bad tag object ID");
+	if (repo_oid_to_algop(the_repository, &oid, to, &mapped_oid))
+		return error("unable to map tree %s in tag object",
+			     oid_to_hex(&oid));
+	strbuf_addf(out, "object %s", oid_to_hex(&mapped_oid));
+	strbuf_add(out, p, payload.len - (p - payload.buf));
+	strbuf_addbuf(out, &othersig);
+	if (oursig.len)
+		add_header_signature(out, &oursig, from);
+	return 0;
+}
+
 int convert_object_file(struct strbuf *outbuf,
 			const struct git_hash_algo *from,
 			const struct git_hash_algo *to,
@@ -100,8 +148,10 @@ int convert_object_file(struct strbuf *outbuf,
 	case OBJ_TREE:
 		ret = convert_tree_object(outbuf, from, to, buf, len);
 		break;
-	case OBJ_COMMIT:
 	case OBJ_TAG:
+		ret = convert_tag_object(outbuf, from, to, buf, len);
+		break;
+	case OBJ_COMMIT:
 	default:
 		/* Not implemented yet, so fail. */
 		ret = -1;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH 17/30] object-file-convert: Don't leak when converting tag objects
  2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
                   ` (15 preceding siblings ...)
  2023-09-27 19:55 ` [PATCH 16/30] object-file-convert: convert tag objects when writing Eric W. Biederman
@ 2023-09-27 19:55 ` Eric W. Biederman
  2023-09-27 19:55 ` [PATCH 18/30] object-file-convert: convert commit objects when writing Eric W. Biederman
                   ` (14 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-09-27 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

Upon close examination I discovered that while brian's code to convert
tag objects was functionally correct, it leaked memory.

Rearrange the code so that all error checking happens before any
memory is allocated.

Add code to release the temporary strbufs the code uses.

The code pretty much assumes the tag object ends with a newline,
so add an explict test to verify that is the case.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 object-file-convert.c | 45 ++++++++++++++++++++++++-------------------
 1 file changed, 25 insertions(+), 20 deletions(-)

diff --git a/object-file-convert.c b/object-file-convert.c
index 777ae5b58036..822be9d0fdb8 100644
--- a/object-file-convert.c
+++ b/object-file-convert.c
@@ -90,44 +90,49 @@ static int convert_tag_object(struct strbuf *out,
 			      const struct git_hash_algo *to,
 			      const char *buffer, size_t size)
 {
-	struct strbuf payload = STRBUF_INIT, temp = STRBUF_INIT, oursig = STRBUF_INIT, othersig = STRBUF_INIT;
+	struct strbuf payload = STRBUF_INIT, oursig = STRBUF_INIT, othersig = STRBUF_INIT;
+	const int entry_len = from->hexsz + 7;
 	size_t payload_size;
 	struct object_id oid, mapped_oid;
 	const char *p;
 
-	/* Add some slop for longer signature header in the new algorithm. */
-	strbuf_grow(out, size + 7);
+	/* Consume the object line */
+	if ((entry_len >= size) ||
+	    memcmp(buffer, "object ", 7) || buffer[entry_len] != '\n')
+		return error("bogus tag object");
+	if (parse_oid_hex_algop(buffer + 7, &oid, &p, from) < 0)
+		return error("bad tag object ID");
+	if (repo_oid_to_algop(the_repository, &oid, to, &mapped_oid))
+		return error("unable to map tree %s in tag object",
+			     oid_to_hex(&oid));
+	size -= ((p + 1) - buffer);
+	buffer = p + 1;
 
 	/* Is there a signature for our algorithm? */
 	payload_size = parse_signed_buffer(buffer, size);
-	strbuf_add(&payload, buffer, payload_size);
 	if (payload_size != size) {
 		/* Yes, there is. */
 		strbuf_add(&oursig, buffer + payload_size, size - payload_size);
 	}
-	/* Now, is there a signature for the other algorithm? */
-	if (parse_buffer_signed_by_header(payload.buf, payload.len, &temp, &othersig, to)) {
-		/* Yes, there is. */
-		strbuf_swap(&payload, &temp);
-		strbuf_release(&temp);
-	}
 
+	/* Now, is there a signature for the other algorithm? */
+	parse_buffer_signed_by_header(buffer, payload_size, &payload, &othersig, to);
 	/*
 	 * Our payload is now in payload and we may have up to two signatrures
 	 * in oursig and othersig.
 	 */
-	if (strncmp(payload.buf, "object ", 7) || payload.buf[from->hexsz + 7] != '\n')
-		return error("bogus tag object");
-	if (parse_oid_hex_algop(payload.buf + 7, &oid, &p, from) < 0)
-		return error("bad tag object ID");
-	if (repo_oid_to_algop(the_repository, &oid, to, &mapped_oid))
-		return error("unable to map tree %s in tag object",
-			     oid_to_hex(&oid));
-	strbuf_addf(out, "object %s", oid_to_hex(&mapped_oid));
-	strbuf_add(out, p, payload.len - (p - payload.buf));
-	strbuf_addbuf(out, &othersig);
+
+	/* Add some slop for longer signature header in the new algorithm. */
+	strbuf_grow(out, (7 + to->hexsz + 1) + size + 7);
+	strbuf_addf(out, "object %s\n", oid_to_hex(&mapped_oid));
+	strbuf_addbuf(out, &payload);
 	if (oursig.len)
 		add_header_signature(out, &oursig, from);
+	strbuf_addbuf(out, &othersig);
+
+	strbuf_release(&payload);
+	strbuf_release(&othersig);
+	strbuf_release(&oursig);
 	return 0;
 }
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH 18/30] object-file-convert: convert commit objects when writing
  2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
                   ` (16 preceding siblings ...)
  2023-09-27 19:55 ` [PATCH 17/30] object-file-convert: Don't leak when converting tag objects Eric W. Biederman
@ 2023-09-27 19:55 ` Eric W. Biederman
  2023-09-27 19:55 ` [PATCH 19/30] object-file-convert: Convert commits that embed signed tags Eric W. Biederman
                   ` (13 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-09-27 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W . Biederman

From: "brian m. carlson" <sandals@crustytoothpaste.net>

When writing a commit object in a repository with both SHA-1 and
SHA-256, we'll need to convert our commit objects so that we can write
the hash values for both into the repository.  To do so, let's add a
function to convert commit objects.

Read the commit object and map the tree value and any of the parent
values, and copy the rest of the commit through unmodified.  Note that
we don't need to modify the signature headers, because they are the
same under both algorithms.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 object-file-convert.c | 46 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 45 insertions(+), 1 deletion(-)

diff --git a/object-file-convert.c b/object-file-convert.c
index 822be9d0fdb8..f53e14e5a170 100644
--- a/object-file-convert.c
+++ b/object-file-convert.c
@@ -136,6 +136,48 @@ static int convert_tag_object(struct strbuf *out,
 	return 0;
 }
 
+static int convert_commit_object(struct strbuf *out,
+				 const struct git_hash_algo *from,
+				 const struct git_hash_algo *to,
+				 const char *buffer, size_t size)
+{
+	const char *tail = buffer;
+	const char *bufptr = buffer;
+	const int tree_entry_len = from->hexsz + 5;
+	const int parent_entry_len = from->hexsz + 7;
+	struct object_id oid, mapped_oid;
+	const char *p;
+
+	tail += size;
+	if (tail <= bufptr + tree_entry_len + 1 || memcmp(bufptr, "tree ", 5) ||
+			bufptr[tree_entry_len] != '\n')
+		return error("bogus commit object");
+	if (parse_oid_hex_algop(bufptr + 5, &oid, &p, from) < 0)
+		return error("bad tree pointer");
+
+	if (repo_oid_to_algop(the_repository, &oid, to, &mapped_oid))
+		return error("unable to map tree %s in commit object",
+			     oid_to_hex(&oid));
+	strbuf_addf(out, "tree %s\n", oid_to_hex(&mapped_oid));
+	bufptr = p + 1;
+
+	while (bufptr + parent_entry_len < tail && !memcmp(bufptr, "parent ", 7)) {
+		if (tail <= bufptr + parent_entry_len + 1 ||
+		    parse_oid_hex_algop(bufptr + 7, &oid, &p, from) ||
+		    *p != '\n')
+			return error("bad parents in commit");
+
+		if (repo_oid_to_algop(the_repository, &oid, to, &mapped_oid))
+			return error("unable to map parent %s in commit object",
+				     oid_to_hex(&oid));
+
+		strbuf_addf(out, "parent %s\n", oid_to_hex(&mapped_oid));
+		bufptr = p + 1;
+	}
+	strbuf_add(out, bufptr, tail - bufptr);
+	return 0;
+}
+
 int convert_object_file(struct strbuf *outbuf,
 			const struct git_hash_algo *from,
 			const struct git_hash_algo *to,
@@ -150,13 +192,15 @@ int convert_object_file(struct strbuf *outbuf,
 		die("Refusing noop object file conversion");
 
 	switch (type) {
+	case OBJ_COMMIT:
+		ret = convert_commit_object(outbuf, from, to, buf, len);
+		break;
 	case OBJ_TREE:
 		ret = convert_tree_object(outbuf, from, to, buf, len);
 		break;
 	case OBJ_TAG:
 		ret = convert_tag_object(outbuf, from, to, buf, len);
 		break;
-	case OBJ_COMMIT:
 	default:
 		/* Not implemented yet, so fail. */
 		ret = -1;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH 19/30] object-file-convert: Convert commits that embed signed tags
  2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
                   ` (17 preceding siblings ...)
  2023-09-27 19:55 ` [PATCH 18/30] object-file-convert: convert commit objects when writing Eric W. Biederman
@ 2023-09-27 19:55 ` Eric W. Biederman
  2023-09-27 19:55 ` [PATCH 20/30] object-file: Update object_info_extended to reencode objects Eric W. Biederman
                   ` (12 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-09-27 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

As mentioned in the hash function transition plan commit mergetag
lines need to be handled.  The commit mergetag lines embed an entire
tag object in a commit object.

Keep the implementation sane if not fast by unembedding the tag
object, converting the tag object, and embedding the new tag object,
in the new commit object.

In the long run I don't expect any other approach is maintainable, as
tag objects may be extended in ways that require additional
translation.

To keep the implementation of convert_commit_object maintainable I
have modified convert_commit_object to process the lines in any order,
and to fail on unknown lines.  We can't know ahead of time if a new
line might embed something that needs translation or not so it is
better to fail and require the code to be updated instead of silently
mistranslating objects.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 object-file-convert.c | 104 +++++++++++++++++++++++++++++++++---------
 1 file changed, 82 insertions(+), 22 deletions(-)

diff --git a/object-file-convert.c b/object-file-convert.c
index f53e14e5a170..8ede9889a7ab 100644
--- a/object-file-convert.c
+++ b/object-file-convert.c
@@ -146,35 +146,95 @@ static int convert_commit_object(struct strbuf *out,
 	const int tree_entry_len = from->hexsz + 5;
 	const int parent_entry_len = from->hexsz + 7;
 	struct object_id oid, mapped_oid;
-	const char *p;
+	const char *p, *eol;
 
 	tail += size;
-	if (tail <= bufptr + tree_entry_len + 1 || memcmp(bufptr, "tree ", 5) ||
-			bufptr[tree_entry_len] != '\n')
-		return error("bogus commit object");
-	if (parse_oid_hex_algop(bufptr + 5, &oid, &p, from) < 0)
-		return error("bad tree pointer");
 
-	if (repo_oid_to_algop(the_repository, &oid, to, &mapped_oid))
-		return error("unable to map tree %s in commit object",
-			     oid_to_hex(&oid));
-	strbuf_addf(out, "tree %s\n", oid_to_hex(&mapped_oid));
-	bufptr = p + 1;
+	while ((bufptr < tail) && (*bufptr != '\n')) {
+		eol = memchr(bufptr, '\n', tail - bufptr);
+		if (!eol)
+			return error(_("bad %s in commit"), "line");
+
+		if (((bufptr + 5) < eol) && !memcmp(bufptr, "tree ", 5))
+		{
+			if (((bufptr + tree_entry_len) != eol) ||
+			    parse_oid_hex_algop(bufptr + 5, &oid, &p, from) ||
+			    (p != eol))
+				return error(_("bad %s in commit"), "tree");
+
+			if (repo_oid_to_algop(the_repository, &oid, to, &mapped_oid))
+				return error(_("unable to map %s %s in commit object"),
+					     "tree", oid_to_hex(&oid));
+			strbuf_addf(out, "tree %s\n", oid_to_hex(&mapped_oid));
+		}
+		else if (((bufptr + 7) < eol) && !memcmp(bufptr, "parent ", 7))
+		{
+			if (((bufptr + parent_entry_len) != eol) ||
+			    parse_oid_hex_algop(bufptr + 7, &oid, &p, from) ||
+			    (p != eol))
+				return error(_("bad %s in commit"), "parent");
 
-	while (bufptr + parent_entry_len < tail && !memcmp(bufptr, "parent ", 7)) {
-		if (tail <= bufptr + parent_entry_len + 1 ||
-		    parse_oid_hex_algop(bufptr + 7, &oid, &p, from) ||
-		    *p != '\n')
-			return error("bad parents in commit");
+			if (repo_oid_to_algop(the_repository, &oid, to, &mapped_oid))
+				return error(_("unable to map %s %s in commit object"),
+					     "parent", oid_to_hex(&oid));
 
-		if (repo_oid_to_algop(the_repository, &oid, to, &mapped_oid))
-			return error("unable to map parent %s in commit object",
-				     oid_to_hex(&oid));
+			strbuf_addf(out, "parent %s\n", oid_to_hex(&mapped_oid));
+		}
+		else if (((bufptr + 9) < eol) && !memcmp(bufptr, "mergetag ", 9))
+		{
+			struct strbuf tag = STRBUF_INIT, new_tag = STRBUF_INIT;
 
-		strbuf_addf(out, "parent %s\n", oid_to_hex(&mapped_oid));
-		bufptr = p + 1;
+			/* Recover the tag object from the mergetag */
+			strbuf_add(&tag, bufptr + 9, (eol - (bufptr + 9)) + 1);
+
+			bufptr = eol + 1;
+			while ((bufptr < tail) && (*bufptr == ' ')) {
+				eol = memchr(bufptr, '\n', tail - bufptr);
+				if (!eol) {
+					strbuf_release(&tag);
+					return error(_("bad %s in commit"), "mergetag continuation");
+				}
+				strbuf_add(&tag, bufptr + 1, (eol - (bufptr + 1)) + 1);
+				bufptr = eol + 1;
+			}
+
+			/* Compute the new tag object */
+			if (convert_tag_object(&new_tag, from, to, tag.buf, tag.len)) {
+				strbuf_release(&tag);
+				strbuf_release(&new_tag);
+				return -1;
+			}
+
+			/* Write the new mergetag */
+			strbuf_addstr(out, "mergetag");
+			strbuf_add_lines(out, " ", new_tag.buf, new_tag.len);
+			strbuf_release(&tag);
+			strbuf_release(&new_tag);
+		}
+		else if (((bufptr + 7) < tail) && !memcmp(bufptr, "author ", 7))
+			strbuf_add(out, bufptr, (eol - bufptr) + 1);
+		else if (((bufptr + 10) < tail) && !memcmp(bufptr, "committer ", 10))
+			strbuf_add(out, bufptr, (eol - bufptr) + 1);
+		else if (((bufptr + 9) < tail) && !memcmp(bufptr, "encoding ", 9))
+			strbuf_add(out, bufptr, (eol - bufptr) + 1);
+		else if (((bufptr + 6) < tail) && !memcmp(bufptr, "gpgsig", 6))
+			strbuf_add(out, bufptr, (eol - bufptr) + 1);
+		else {
+			/* Unknown line fail it might embed an oid */
+			return -1;
+		}
+		/* Consume any trailing continuation lines */
+		bufptr = eol + 1;
+		while ((bufptr < tail) && (*bufptr == ' ')) {
+			eol = memchr(bufptr, '\n', tail - bufptr);
+			if (!eol)
+				return error(_("bad %s in commit"), "continuation");
+			strbuf_add(out, bufptr, (eol - bufptr) + 1);
+			bufptr = eol + 1;
+		}
 	}
-	strbuf_add(out, bufptr, tail - bufptr);
+	if (bufptr < tail)
+		strbuf_add(out, bufptr, tail - bufptr);
 	return 0;
 }
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH 20/30] object-file: Update object_info_extended to reencode objects
  2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
                   ` (18 preceding siblings ...)
  2023-09-27 19:55 ` [PATCH 19/30] object-file-convert: Convert commits that embed signed tags Eric W. Biederman
@ 2023-09-27 19:55 ` Eric W. Biederman
  2023-09-27 19:55 ` [PATCH 21/30] repository: Implement extensions.compatObjectFormat Eric W. Biederman
                   ` (11 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-09-27 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

oid_object_info_extended is updated to detect an oid encoding that
does not match the current repository, use repo_oid_to_algop to find
the correspoding oid in the current repository and to return the data
for the oid.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 object-file.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 91 insertions(+)

diff --git a/object-file.c b/object-file.c
index d66d11890696..1601d624c9fd 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1662,10 +1662,101 @@ static int do_oid_object_info_extended(struct repository *r,
 	return 0;
 }
 
+static int oid_object_info_convert(struct repository *r,
+				   const struct object_id *input_oid,
+				   struct object_info *input_oi, unsigned flags)
+{
+	const struct git_hash_algo *input_algo = &hash_algos[input_oid->algo];
+	int do_die = flags & OBJECT_INFO_DIE_IF_CORRUPT;
+	struct strbuf type_name = STRBUF_INIT;
+	struct object_id oid, delta_base_oid;
+	struct object_info new_oi, *oi;
+	unsigned long size;
+	void *content;
+	int ret;
+
+	if (repo_oid_to_algop(r, input_oid, the_hash_algo, &oid)) {
+		if (do_die)
+			die(_("missing mapping of %s to %s"),
+			    oid_to_hex(input_oid), the_hash_algo->name);
+		return -1;
+	}
+
+	/* Is new_oi needed? */
+	oi = input_oi;
+	if (input_oi && (input_oi->delta_base_oid || input_oi->sizep ||
+			 input_oi->contentp)) {
+		new_oi = *input_oi;
+		/* Does delta_base_oid need to be converted? */
+		if (input_oi->delta_base_oid)
+			new_oi.delta_base_oid = &delta_base_oid;
+		/* Will the attributes differ when converted? */
+		if (input_oi->sizep || input_oi->contentp) {
+			new_oi.contentp = &content;
+			new_oi.sizep = &size;
+			new_oi.type_name = &type_name;
+		}
+		oi = &new_oi;
+	}
+
+	ret = oid_object_info_extended(r, &oid, oi, flags);
+	if (ret)
+		return -1;
+	if (oi == input_oi)
+		return ret;
+
+	if (new_oi.contentp) {
+		struct strbuf outbuf = STRBUF_INIT;
+		enum object_type type;
+
+		type = type_from_string_gently(type_name.buf, type_name.len,
+					       !do_die);
+		if (type == -1)
+			return -1;
+		if (type != OBJ_BLOB) {
+			ret = convert_object_file(&outbuf,
+						  the_hash_algo, input_algo,
+						  content, size, type, !do_die);
+			if (ret == -1)
+				return -1;
+			free(content);
+			size = outbuf.len;
+			content = strbuf_detach(&outbuf, NULL);
+		}
+		if (input_oi->sizep)
+			*input_oi->sizep = size;
+		if (input_oi->contentp)
+			*input_oi->contentp = content;
+		else
+			free(content);
+		if (input_oi->type_name)
+			*input_oi->type_name = type_name;
+		else
+			strbuf_release(&type_name);
+	}
+	if (new_oi.delta_base_oid == &delta_base_oid) {
+		if (repo_oid_to_algop(r, &delta_base_oid, input_algo,
+				 input_oi->delta_base_oid)) {
+			if (do_die)
+				die(_("missing mapping of %s to %s"),
+				    oid_to_hex(&delta_base_oid),
+				    input_algo->name);
+			return -1;
+		}
+	}
+	input_oi->whence = new_oi.whence;
+	input_oi->u = new_oi.u;
+	return ret;
+}
+
 int oid_object_info_extended(struct repository *r, const struct object_id *oid,
 			     struct object_info *oi, unsigned flags)
 {
 	int ret;
+
+	if (oid->algo && (hash_algo_by_ptr(r->hash_algo) != oid->algo))
+		return oid_object_info_convert(r, oid, oi, flags);
+
 	obj_read_lock();
 	ret = do_oid_object_info_extended(r, oid, oi, flags);
 	obj_read_unlock();
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH 21/30] repository: Implement extensions.compatObjectFormat
  2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
                   ` (19 preceding siblings ...)
  2023-09-27 19:55 ` [PATCH 20/30] object-file: Update object_info_extended to reencode objects Eric W. Biederman
@ 2023-09-27 19:55 ` Eric W. Biederman
  2023-09-27 21:39   ` Junio C Hamano
  2023-09-27 19:55 ` [PATCH 22/30] rev-parse: Add an --output-object-format parameter Eric W. Biederman
                   ` (10 subsequent siblings)
  31 siblings, 1 reply; 104+ messages in thread
From: Eric W. Biederman @ 2023-09-27 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W. Biederman

From: "brian m. carlson" <sandals@crustytoothpaste.net>

Add a configuration option to enable updating and reading from
compatibility hash maps when git accesses the reposotiry.

Call the helper function repo_set_compat_hash_algo with the value
that compatObjectFormat is set to.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 Documentation/config/extensions.txt | 12 ++++++++++++
 repository.c                        |  2 +-
 setup.c                             | 23 +++++++++++++++++++++--
 setup.h                             |  1 +
 4 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/Documentation/config/extensions.txt b/Documentation/config/extensions.txt
index bccaec7a9636..9f72e6d9f4f1 100644
--- a/Documentation/config/extensions.txt
+++ b/Documentation/config/extensions.txt
@@ -7,6 +7,18 @@ Note that this setting should only be set by linkgit:git-init[1] or
 linkgit:git-clone[1].  Trying to change it after initialization will not
 work and will produce hard-to-diagnose issues.
 
+extensions.compatObjectFormat::
+
+	Specify a compatitbility hash algorithm to use.  The acceptable values
+	are `sha1` and `sha256`.  The value specified must be different from the
+	value of extensions.objectFormat.  This allows client level
+	interoperability between git repositories whose objectFormat matches
+	this compatObjectFormat.  In particular when fully implemented the
+	pushes and pulls from a repository in whose objectFormat matches
+	compatObjectFormat.  As well as being able to use oids encoded in
+	compatObjectFormat in addition to oids encoded with objectFormat to
+	locally specify objects.
+
 extensions.worktreeConfig::
 	If enabled, then worktrees will load config settings from the
 	`$GIT_DIR/config.worktree` file in addition to the
diff --git a/repository.c b/repository.c
index 6214f61cf4e7..9d91536b613b 100644
--- a/repository.c
+++ b/repository.c
@@ -194,7 +194,7 @@ int repo_init(struct repository *repo,
 		goto error;
 
 	repo_set_hash_algo(repo, format.hash_algo);
-	repo_set_compat_hash_algo(repo, GIT_HASH_UNKNOWN);
+	repo_set_compat_hash_algo(repo, format.compat_hash_algo);
 	repo->repository_format_worktree_config = format.worktree_config;
 
 	/* take ownership of format.partial_clone */
diff --git a/setup.c b/setup.c
index deb5a33fe9e1..87b40472dbc5 100644
--- a/setup.c
+++ b/setup.c
@@ -598,6 +598,25 @@ static enum extension_result handle_extension(const char *var,
 		}
 		data->hash_algo = format;
 		return EXTENSION_OK;
+	} else if (!strcmp(ext, "compatobjectformat")) {
+		struct string_list_item *item;
+		int format;
+
+		if (!value)
+			return config_error_nonbool(var);
+		format = hash_algo_by_name(value);
+		if (format == GIT_HASH_UNKNOWN)
+			return error(_("invalid value for '%s': '%s'"),
+				     "extensions.compatobjectformat", value);
+		/* For now only support compatObjectFormat being specified once. */
+		for_each_string_list_item(item, &data->v1_only_extensions) {
+			if (!strcmp(item->string, "compatobjectformat"))
+				return error(_("'%s' already specified as '%s'"),
+					"extensions.compatobjectformat",
+					hash_algos[data->compat_hash_algo].name);
+		}
+		data->compat_hash_algo = format;
+		return EXTENSION_OK;
 	}
 	return EXTENSION_UNKNOWN;
 }
@@ -1573,7 +1592,7 @@ const char *setup_git_directory_gently(int *nongit_ok)
 		if (startup_info->have_repository) {
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
 			repo_set_compat_hash_algo(the_repository,
-						  GIT_HASH_UNKNOWN);
+						  repo_fmt.compat_hash_algo);
 			the_repository->repository_format_worktree_config =
 				repo_fmt.worktree_config;
 			/* take ownership of repo_fmt.partial_clone */
@@ -1667,7 +1686,7 @@ void check_repository_format(struct repository_format *fmt)
 	check_repository_format_gently(get_git_dir(), fmt, NULL);
 	startup_info->have_repository = 1;
 	repo_set_hash_algo(the_repository, fmt->hash_algo);
-	repo_set_compat_hash_algo(the_repository, GIT_HASH_UNKNOWN);
+	repo_set_compat_hash_algo(the_repository, fmt->compat_hash_algo);
 	the_repository->repository_format_worktree_config =
 		fmt->worktree_config;
 	the_repository->repository_format_partial_clone =
diff --git a/setup.h b/setup.h
index 58fd2605dd26..5d678ceb8caa 100644
--- a/setup.h
+++ b/setup.h
@@ -86,6 +86,7 @@ struct repository_format {
 	int worktree_config;
 	int is_bare;
 	int hash_algo;
+	int compat_hash_algo;
 	int sparse_index;
 	char *work_tree;
 	struct string_list unknown_extensions;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH 22/30] rev-parse: Add an --output-object-format parameter
  2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
                   ` (20 preceding siblings ...)
  2023-09-27 19:55 ` [PATCH 21/30] repository: Implement extensions.compatObjectFormat Eric W. Biederman
@ 2023-09-27 19:55 ` Eric W. Biederman
  2023-09-27 19:55 ` [PATCH 23/30] builtin/cat-file: Let the oid determine the output algorithm Eric W. Biederman
                   ` (9 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-09-27 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

The new --output-object-format parameter returns the oid in the
specified format.

This is a generally useful plumbing facility.  It is useful for writing
test cases and for directly querying the translation maps.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 Documentation/git-rev-parse.txt | 12 ++++++++++++
 builtin/rev-parse.c             | 23 +++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/Documentation/git-rev-parse.txt b/Documentation/git-rev-parse.txt
index f26a7591e373..f0f9021f2a5a 100644
--- a/Documentation/git-rev-parse.txt
+++ b/Documentation/git-rev-parse.txt
@@ -159,6 +159,18 @@ for another option.
 	unfortunately named tag "master"), and show them as full
 	refnames (e.g. "refs/heads/master").
 
+--output-object-format=(sha1|sha256|storage)::
+
+	Allow oids to be input from any object format that the current
+	repository supports.
+
+	Specifying "sha1" translates if necessary and returns a sha1 oid.
+
+	Specifying "sha256" translates if necessary and returns a sha256 oid.
+
+	Specifying "storage" translates if necessary and returns an oid in
+	encoded in the storage hash algorithm.
+
 Options for Objects
 ~~~~~~~~~~~~~~~~~~~
 
diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index 43e96765400c..0ef3e658cc5b 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -25,6 +25,7 @@
 #include "submodule.h"
 #include "commit-reach.h"
 #include "shallow.h"
+#include "object-file-convert.h"
 
 #define DO_REVS		1
 #define DO_NOREV	2
@@ -675,6 +676,8 @@ static void print_path(const char *path, const char *prefix, enum format_type fo
 int cmd_rev_parse(int argc, const char **argv, const char *prefix)
 {
 	int i, as_is = 0, verify = 0, quiet = 0, revs_count = 0, type = 0;
+	const struct git_hash_algo *output_algo = NULL;
+	const struct git_hash_algo *compat = NULL;
 	int did_repo_setup = 0;
 	int has_dashdash = 0;
 	int output_prefix = 0;
@@ -746,6 +749,7 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
 
 			prepare_repo_settings(the_repository);
 			the_repository->settings.command_requires_full_index = 0;
+			compat = the_repository->compat_hash_algo;
 		}
 
 		if (!strcmp(arg, "--")) {
@@ -833,6 +837,22 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
 				flags |= GET_OID_QUIETLY;
 				continue;
 			}
+			if (opt_with_value(arg, "--output-object-format", &arg)) {
+				if (!arg)
+					die(_("no object format specified"));
+				if (!strcmp(arg, the_hash_algo->name) ||
+				    !strcmp(arg, "storage")) {
+					flags |= GET_OID_HASH_ANY;
+					output_algo = the_hash_algo;
+					continue;
+				}
+				else if (compat && !strcmp(arg, compat->name)) {
+					flags |= GET_OID_HASH_ANY;
+					output_algo = compat;
+					continue;
+				}
+				else die(_("unsupported object format: %s"), arg);
+			}
 			if (opt_with_value(arg, "--short", &arg)) {
 				filter &= ~(DO_FLAGS|DO_NOREV);
 				verify = 1;
@@ -1083,6 +1103,9 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
 		}
 		if (!get_oid_with_context(the_repository, name,
 					  flags, &oid, &unused)) {
+			if (output_algo)
+				repo_oid_to_algop(the_repository, &oid,
+						  output_algo, &oid);
 			if (verify)
 				revs_count++;
 			else
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH 23/30] builtin/cat-file:  Let the oid determine the output algorithm
  2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
                   ` (21 preceding siblings ...)
  2023-09-27 19:55 ` [PATCH 22/30] rev-parse: Add an --output-object-format parameter Eric W. Biederman
@ 2023-09-27 19:55 ` Eric W. Biederman
  2023-09-27 19:55 ` [PATCH 24/30] tree-walk: init_tree_desc take an oid to get the hash algorithm Eric W. Biederman
                   ` (8 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-09-27 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

Use GET_OID_HASH_ANY when calling get_oid_with_context.  This
implements the semi-obvious behaviour that specifying a sha1 oid shows
the output for a sha1 encoded object, and specifying a sha256 oid
shows the output for a sha256 encoded object.

This is useful for testing the the conversion of an object to an
equivalent object encoded with a different hash function.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 builtin/cat-file.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 694c8538df2f..e615d1f8e0da 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -107,7 +107,10 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 	struct object_info oi = OBJECT_INFO_INIT;
 	struct strbuf sb = STRBUF_INIT;
 	unsigned flags = OBJECT_INFO_LOOKUP_REPLACE;
-	unsigned get_oid_flags = GET_OID_RECORD_PATH | GET_OID_ONLY_TO_DIE;
+	unsigned get_oid_flags =
+		GET_OID_RECORD_PATH |
+		GET_OID_ONLY_TO_DIE |
+		GET_OID_HASH_ANY;
 	const char *path = force_path;
 	const int opt_cw = (opt == 'c' || opt == 'w');
 	if (!path && opt_cw)
@@ -223,7 +226,8 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 								     &size);
 				const char *target;
 				if (!skip_prefix(buffer, "object ", &target) ||
-				    get_oid_hex(target, &blob_oid))
+				    get_oid_hex_algop(target, &blob_oid,
+						      &hash_algos[oid.algo]))
 					die("%s not a valid tag", oid_to_hex(&oid));
 				free(buffer);
 			} else
@@ -512,7 +516,9 @@ static void batch_one_object(const char *obj_name,
 			     struct expand_data *data)
 {
 	struct object_context ctx;
-	int flags = opt->follow_symlinks ? GET_OID_FOLLOW_SYMLINKS : 0;
+	int flags =
+		GET_OID_HASH_ANY |
+		(opt->follow_symlinks ? GET_OID_FOLLOW_SYMLINKS : 0);
 	enum get_oid_result result;
 
 	result = get_oid_with_context(the_repository, obj_name,
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH 24/30] tree-walk: init_tree_desc take an oid to get the hash algorithm
  2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
                   ` (22 preceding siblings ...)
  2023-09-27 19:55 ` [PATCH 23/30] builtin/cat-file: Let the oid determine the output algorithm Eric W. Biederman
@ 2023-09-27 19:55 ` Eric W. Biederman
  2023-09-27 19:55 ` [PATCH 25/30] object-file: Handle compat objects in check_object_signature Eric W. Biederman
                   ` (7 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-09-27 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

To make it possible for git ls-tree to display the tree encoded
in the hash algorithm of the oid specified to git ls-tree, update
init_tree_desc to take as a parameter the oid of the tree object.

Update all callers of init_tree_desc and init_tree_desc_gently
to pass the oid of the tree object.

Use the oid of the tree object to discover the hash algorithm
of the oid and store that hash algorithm in struct tree_desc.

Use the hash algorithm in decode_tree_entry and
update_tree_entry_internal to handle reading a tree object encoded in
a hash algorithm that differs from the repositories hash algorithm.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 archive.c              |  3 ++-
 builtin/am.c           |  6 +++---
 builtin/checkout.c     |  8 +++++---
 builtin/clone.c        |  2 +-
 builtin/commit.c       |  2 +-
 builtin/grep.c         |  8 ++++----
 builtin/merge.c        |  3 ++-
 builtin/pack-objects.c |  6 ++++--
 builtin/read-tree.c    |  2 +-
 builtin/stash.c        |  5 +++--
 cache-tree.c           |  2 +-
 delta-islands.c        |  2 +-
 diff-lib.c             |  2 +-
 fsck.c                 |  6 ++++--
 http-push.c            |  2 +-
 list-objects.c         |  2 +-
 match-trees.c          |  4 ++--
 merge-ort.c            | 11 ++++++-----
 merge-recursive.c      |  2 +-
 merge.c                |  3 ++-
 pack-bitmap-write.c    |  2 +-
 packfile.c             |  3 ++-
 reflog.c               |  2 +-
 revision.c             |  4 ++--
 tree-walk.c            | 36 +++++++++++++++++++++---------------
 tree-walk.h            |  7 +++++--
 tree.c                 |  2 +-
 walker.c               |  2 +-
 28 files changed, 80 insertions(+), 59 deletions(-)

diff --git a/archive.c b/archive.c
index ca11db185b15..b10269aee7be 100644
--- a/archive.c
+++ b/archive.c
@@ -339,7 +339,8 @@ int write_archive_entries(struct archiver_args *args,
 		opts.src_index = args->repo->index;
 		opts.dst_index = args->repo->index;
 		opts.fn = oneway_merge;
-		init_tree_desc(&t, args->tree->buffer, args->tree->size);
+		init_tree_desc(&t, &args->tree->object.oid,
+			       args->tree->buffer, args->tree->size);
 		if (unpack_trees(1, &t, &opts))
 			return -1;
 		git_attr_set_direction(GIT_ATTR_INDEX);
diff --git a/builtin/am.c b/builtin/am.c
index 8bde034fae68..4dfd714b910e 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1991,8 +1991,8 @@ static int fast_forward_to(struct tree *head, struct tree *remote, int reset)
 	opts.reset = reset ? UNPACK_RESET_PROTECT_UNTRACKED : 0;
 	opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
 	opts.fn = twoway_merge;
-	init_tree_desc(&t[0], head->buffer, head->size);
-	init_tree_desc(&t[1], remote->buffer, remote->size);
+	init_tree_desc(&t[0], &head->object.oid, head->buffer, head->size);
+	init_tree_desc(&t[1], &remote->object.oid, remote->buffer, remote->size);
 
 	if (unpack_trees(2, t, &opts)) {
 		rollback_lock_file(&lock_file);
@@ -2026,7 +2026,7 @@ static int merge_tree(struct tree *tree)
 	opts.dst_index = &the_index;
 	opts.merge = 1;
 	opts.fn = oneway_merge;
-	init_tree_desc(&t[0], tree->buffer, tree->size);
+	init_tree_desc(&t[0], &tree->object.oid, tree->buffer, tree->size);
 
 	if (unpack_trees(1, t, &opts)) {
 		rollback_lock_file(&lock_file);
diff --git a/builtin/checkout.c b/builtin/checkout.c
index f53612f46870..03eff73fd031 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -701,7 +701,7 @@ static int reset_tree(struct tree *tree, const struct checkout_opts *o,
 			       info->commit ? &info->commit->object.oid : null_oid(),
 			       NULL);
 	parse_tree(tree);
-	init_tree_desc(&tree_desc, tree->buffer, tree->size);
+	init_tree_desc(&tree_desc, &tree->object.oid, tree->buffer, tree->size);
 	switch (unpack_trees(1, &tree_desc, &opts)) {
 	case -2:
 		*writeout_error = 1;
@@ -815,10 +815,12 @@ static int merge_working_tree(const struct checkout_opts *opts,
 			die(_("unable to parse commit %s"),
 				oid_to_hex(old_commit_oid));
 
-		init_tree_desc(&trees[0], tree->buffer, tree->size);
+		init_tree_desc(&trees[0], &tree->object.oid,
+			       tree->buffer, tree->size);
 		parse_tree(new_tree);
 		tree = new_tree;
-		init_tree_desc(&trees[1], tree->buffer, tree->size);
+		init_tree_desc(&trees[1], &tree->object.oid,
+			       tree->buffer, tree->size);
 
 		ret = unpack_trees(2, trees, &topts);
 		clear_unpack_trees_porcelain(&topts);
diff --git a/builtin/clone.c b/builtin/clone.c
index c6357af94989..79ceefb93995 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -737,7 +737,7 @@ static int checkout(int submodule_progress, int filter_submodules)
 	if (!tree)
 		die(_("unable to parse commit %s"), oid_to_hex(&oid));
 	parse_tree(tree);
-	init_tree_desc(&t, tree->buffer, tree->size);
+	init_tree_desc(&t, &tree->object.oid, tree->buffer, tree->size);
 	if (unpack_trees(1, &t, &opts) < 0)
 		die(_("unable to checkout working tree"));
 
diff --git a/builtin/commit.c b/builtin/commit.c
index 7da5f924484d..537319932b65 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -340,7 +340,7 @@ static void create_base_index(const struct commit *current_head)
 	if (!tree)
 		die(_("failed to unpack HEAD tree object"));
 	parse_tree(tree);
-	init_tree_desc(&t, tree->buffer, tree->size);
+	init_tree_desc(&t, &tree->object.oid, tree->buffer, tree->size);
 	if (unpack_trees(1, &t, &opts))
 		exit(128); /* We've already reported the error, finish dying */
 }
diff --git a/builtin/grep.c b/builtin/grep.c
index 50e712a18479..0c2b8a376f8e 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -530,7 +530,7 @@ static int grep_submodule(struct grep_opt *opt,
 		strbuf_addstr(&base, filename);
 		strbuf_addch(&base, '/');
 
-		init_tree_desc(&tree, data, size);
+		init_tree_desc(&tree, oid, data, size);
 		hit = grep_tree(&subopt, pathspec, &tree, &base, base.len,
 				object_type == OBJ_COMMIT);
 		strbuf_release(&base);
@@ -574,7 +574,7 @@ static int grep_cache(struct grep_opt *opt,
 
 			data = repo_read_object_file(the_repository, &ce->oid,
 						     &type, &size);
-			init_tree_desc(&tree, data, size);
+			init_tree_desc(&tree, &ce->oid, data, size);
 
 			hit |= grep_tree(opt, pathspec, &tree, &name, 0, 0);
 			strbuf_setlen(&name, name_base_len);
@@ -670,7 +670,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 				    oid_to_hex(&entry.oid));
 
 			strbuf_addch(base, '/');
-			init_tree_desc(&sub, data, size);
+			init_tree_desc(&sub, &entry.oid, data, size);
 			hit |= grep_tree(opt, pathspec, &sub, base, tn_len,
 					 check_attr);
 			free(data);
@@ -714,7 +714,7 @@ static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec,
 			strbuf_add(&base, name, len);
 			strbuf_addch(&base, ':');
 		}
-		init_tree_desc(&tree, data, size);
+		init_tree_desc(&tree, &obj->oid, data, size);
 		hit = grep_tree(opt, pathspec, &tree, &base, base.len,
 				obj->type == OBJ_COMMIT);
 		strbuf_release(&base);
diff --git a/builtin/merge.c b/builtin/merge.c
index de68910177fb..718165d45917 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -704,7 +704,8 @@ static int read_tree_trivial(struct object_id *common, struct object_id *head,
 	cache_tree_free(&the_index.cache_tree);
 	for (i = 0; i < nr_trees; i++) {
 		parse_tree(trees[i]);
-		init_tree_desc(t+i, trees[i]->buffer, trees[i]->size);
+		init_tree_desc(t+i, &trees[i]->object.oid,
+			       trees[i]->buffer, trees[i]->size);
 	}
 	if (unpack_trees(nr_trees, t, &opts))
 		return -1;
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index d2a162d52804..d34902002656 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1756,7 +1756,8 @@ static void add_pbase_object(struct tree_desc *tree,
 			tree = pbase_tree_get(&entry.oid);
 			if (!tree)
 				return;
-			init_tree_desc(&sub, tree->tree_data, tree->tree_size);
+			init_tree_desc(&sub, &tree->oid,
+				       tree->tree_data, tree->tree_size);
 
 			add_pbase_object(&sub, down, downlen, fullname);
 			pbase_tree_put(tree);
@@ -1816,7 +1817,8 @@ static void add_preferred_base_object(const char *name)
 		}
 		else {
 			struct tree_desc tree;
-			init_tree_desc(&tree, it->pcache.tree_data, it->pcache.tree_size);
+			init_tree_desc(&tree, &it->pcache.oid,
+				       it->pcache.tree_data, it->pcache.tree_size);
 			add_pbase_object(&tree, name, cmplen, name);
 		}
 	}
diff --git a/builtin/read-tree.c b/builtin/read-tree.c
index 1fec702a04fa..24d6d156d3a2 100644
--- a/builtin/read-tree.c
+++ b/builtin/read-tree.c
@@ -264,7 +264,7 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
 	for (i = 0; i < nr_trees; i++) {
 		struct tree *tree = trees[i];
 		parse_tree(tree);
-		init_tree_desc(t+i, tree->buffer, tree->size);
+		init_tree_desc(t+i, &tree->object.oid, tree->buffer, tree->size);
 	}
 	if (unpack_trees(nr_trees, t, &opts))
 		return 128;
diff --git a/builtin/stash.c b/builtin/stash.c
index fe64cde9ce30..9ee52af4d28e 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -285,7 +285,7 @@ static int reset_tree(struct object_id *i_tree, int update, int reset)
 	if (parse_tree(tree))
 		return -1;
 
-	init_tree_desc(t, tree->buffer, tree->size);
+	init_tree_desc(t, &tree->object.oid, tree->buffer, tree->size);
 
 	opts.head_idx = 1;
 	opts.src_index = &the_index;
@@ -871,7 +871,8 @@ static void diff_include_untracked(const struct stash_info *info, struct diff_op
 		tree[i] = parse_tree_indirect(oid[i]);
 		if (parse_tree(tree[i]) < 0)
 			die(_("failed to parse tree"));
-		init_tree_desc(&tree_desc[i], tree[i]->buffer, tree[i]->size);
+		init_tree_desc(&tree_desc[i], &tree[i]->object.oid,
+			       tree[i]->buffer, tree[i]->size);
 	}
 
 	unpack_tree_opt.head_idx = -1;
diff --git a/cache-tree.c b/cache-tree.c
index ddc7d3d86959..334973a01cee 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -770,7 +770,7 @@ static void prime_cache_tree_rec(struct repository *r,
 
 	oidcpy(&it->oid, &tree->object.oid);
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
+	init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size);
 	cnt = 0;
 	while (tree_entry(&desc, &entry)) {
 		if (!S_ISDIR(entry.mode))
diff --git a/delta-islands.c b/delta-islands.c
index 5de5759f3f13..1ff3506b10f2 100644
--- a/delta-islands.c
+++ b/delta-islands.c
@@ -289,7 +289,7 @@ void resolve_tree_islands(struct repository *r,
 		if (!tree || parse_tree(tree) < 0)
 			die(_("bad tree object %s"), oid_to_hex(&ent->idx.oid));
 
-		init_tree_desc(&desc, tree->buffer, tree->size);
+		init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size);
 		while (tree_entry(&desc, &entry)) {
 			struct object *obj;
 
diff --git a/diff-lib.c b/diff-lib.c
index 6b0c6a7180cc..add323f5628d 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -558,7 +558,7 @@ static int diff_cache(struct rev_info *revs,
 	opts.pathspec = &revs->diffopt.pathspec;
 	opts.pathspec->recursive = 1;
 
-	init_tree_desc(&t, tree->buffer, tree->size);
+	init_tree_desc(&t, &tree->object.oid, tree->buffer, tree->size);
 	return unpack_trees(1, &t, &opts);
 }
 
diff --git a/fsck.c b/fsck.c
index 2b1e348005b7..6b492a48da82 100644
--- a/fsck.c
+++ b/fsck.c
@@ -313,7 +313,8 @@ static int fsck_walk_tree(struct tree *tree, void *data, struct fsck_options *op
 		return -1;
 
 	name = fsck_get_object_name(options, &tree->object.oid);
-	if (init_tree_desc_gently(&desc, tree->buffer, tree->size, 0))
+	if (init_tree_desc_gently(&desc, &tree->object.oid,
+				  tree->buffer, tree->size, 0))
 		return -1;
 	while (tree_entry_gently(&desc, &entry)) {
 		struct object *obj;
@@ -583,7 +584,8 @@ static int fsck_tree(const struct object_id *tree_oid,
 	const char *o_name;
 	struct name_stack df_dup_candidates = { NULL };
 
-	if (init_tree_desc_gently(&desc, buffer, size, TREE_DESC_RAW_MODES)) {
+	if (init_tree_desc_gently(&desc, tree_oid, buffer, size,
+				  TREE_DESC_RAW_MODES)) {
 		retval += report(options, tree_oid, OBJ_TREE,
 				 FSCK_MSG_BAD_TREE,
 				 "cannot be parsed as a tree");
diff --git a/http-push.c b/http-push.c
index a704f490fdb2..81c35b5e96f7 100644
--- a/http-push.c
+++ b/http-push.c
@@ -1308,7 +1308,7 @@ static struct object_list **process_tree(struct tree *tree,
 	obj->flags |= SEEN;
 	p = add_one_object(obj, p);
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
+	init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size);
 
 	while (tree_entry(&desc, &entry))
 		switch (object_type(entry.mode)) {
diff --git a/list-objects.c b/list-objects.c
index e60a6cd5b46e..312335c8a7f2 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -97,7 +97,7 @@ static void process_tree_contents(struct traversal_context *ctx,
 	enum interesting match = ctx->revs->diffopt.pathspec.nr == 0 ?
 		all_entries_interesting : entry_not_interesting;
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
+	init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size);
 
 	while (tree_entry(&desc, &entry)) {
 		if (match != all_entries_interesting) {
diff --git a/match-trees.c b/match-trees.c
index 0885ac681cd5..3412b6a1401d 100644
--- a/match-trees.c
+++ b/match-trees.c
@@ -63,7 +63,7 @@ static void *fill_tree_desc_strict(struct tree_desc *desc,
 		die("unable to read tree (%s)", oid_to_hex(hash));
 	if (type != OBJ_TREE)
 		die("%s is not a tree", oid_to_hex(hash));
-	init_tree_desc(desc, buffer, size);
+	init_tree_desc(desc, hash, buffer, size);
 	return buffer;
 }
 
@@ -194,7 +194,7 @@ static int splice_tree(const struct object_id *oid1, const char *prefix,
 	buf = repo_read_object_file(the_repository, oid1, &type, &sz);
 	if (!buf)
 		die("cannot read tree %s", oid_to_hex(oid1));
-	init_tree_desc(&desc, buf, sz);
+	init_tree_desc(&desc, oid1, buf, sz);
 
 	rewrite_here = NULL;
 	while (desc.size) {
diff --git a/merge-ort.c b/merge-ort.c
index 8631c997002d..3a5729c91e48 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -1679,9 +1679,10 @@ static int collect_merge_info(struct merge_options *opt,
 	parse_tree(merge_base);
 	parse_tree(side1);
 	parse_tree(side2);
-	init_tree_desc(t + 0, merge_base->buffer, merge_base->size);
-	init_tree_desc(t + 1, side1->buffer, side1->size);
-	init_tree_desc(t + 2, side2->buffer, side2->size);
+	init_tree_desc(t + 0, &merge_base->object.oid,
+		       merge_base->buffer, merge_base->size);
+	init_tree_desc(t + 1, &side1->object.oid, side1->buffer, side1->size);
+	init_tree_desc(t + 2, &side2->object.oid, side2->buffer, side2->size);
 
 	trace2_region_enter("merge", "traverse_trees", opt->repo);
 	ret = traverse_trees(NULL, 3, t, &info);
@@ -4400,9 +4401,9 @@ static int checkout(struct merge_options *opt,
 	unpack_opts.fn = twoway_merge;
 	unpack_opts.preserve_ignored = 0; /* FIXME: !opts->overwrite_ignore */
 	parse_tree(prev);
-	init_tree_desc(&trees[0], prev->buffer, prev->size);
+	init_tree_desc(&trees[0], &prev->object.oid, prev->buffer, prev->size);
 	parse_tree(next);
-	init_tree_desc(&trees[1], next->buffer, next->size);
+	init_tree_desc(&trees[1], &next->object.oid, next->buffer, next->size);
 
 	ret = unpack_trees(2, trees, &unpack_opts);
 	clear_unpack_trees_porcelain(&unpack_opts);
diff --git a/merge-recursive.c b/merge-recursive.c
index 6a4081bb0f52..93df9eecdd95 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -411,7 +411,7 @@ static inline int merge_detect_rename(struct merge_options *opt)
 static void init_tree_desc_from_tree(struct tree_desc *desc, struct tree *tree)
 {
 	parse_tree(tree);
-	init_tree_desc(desc, tree->buffer, tree->size);
+	init_tree_desc(desc, &tree->object.oid, tree->buffer, tree->size);
 }
 
 static int unpack_trees_start(struct merge_options *opt,
diff --git a/merge.c b/merge.c
index b60925459c29..86179c34102d 100644
--- a/merge.c
+++ b/merge.c
@@ -81,7 +81,8 @@ int checkout_fast_forward(struct repository *r,
 	}
 	for (i = 0; i < nr_trees; i++) {
 		parse_tree(trees[i]);
-		init_tree_desc(t+i, trees[i]->buffer, trees[i]->size);
+		init_tree_desc(t+i, &trees[i]->object.oid,
+			       trees[i]->buffer, trees[i]->size);
 	}
 
 	memset(&opts, 0, sizeof(opts));
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index f6757c3cbf20..9211e08f0127 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -366,7 +366,7 @@ static int fill_bitmap_tree(struct bitmap *bitmap,
 	if (parse_tree(tree) < 0)
 		die("unable to load tree object %s",
 		    oid_to_hex(&tree->object.oid));
-	init_tree_desc(&desc, tree->buffer, tree->size);
+	init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size);
 
 	while (tree_entry(&desc, &entry)) {
 		switch (object_type(entry.mode)) {
diff --git a/packfile.c b/packfile.c
index 9cc0a2e37a83..1fae0fcdd9e7 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2250,7 +2250,8 @@ static int add_promisor_object(const struct object_id *oid,
 		struct tree *tree = (struct tree *)obj;
 		struct tree_desc desc;
 		struct name_entry entry;
-		if (init_tree_desc_gently(&desc, tree->buffer, tree->size, 0))
+		if (init_tree_desc_gently(&desc, &tree->object.oid,
+					  tree->buffer, tree->size, 0))
 			/*
 			 * Error messages are given when packs are
 			 * verified, so do not print any here.
diff --git a/reflog.c b/reflog.c
index 9ad50e7d93e4..c6992a19268f 100644
--- a/reflog.c
+++ b/reflog.c
@@ -40,7 +40,7 @@ static int tree_is_complete(const struct object_id *oid)
 		tree->buffer = data;
 		tree->size = size;
 	}
-	init_tree_desc(&desc, tree->buffer, tree->size);
+	init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size);
 	complete = 1;
 	while (tree_entry(&desc, &entry)) {
 		if (!repo_has_object_file(the_repository, &entry.oid) ||
diff --git a/revision.c b/revision.c
index 2f4c53ea207b..a60dfc23a2a5 100644
--- a/revision.c
+++ b/revision.c
@@ -82,7 +82,7 @@ static void mark_tree_contents_uninteresting(struct repository *r,
 	if (parse_tree_gently(tree, 1) < 0)
 		return;
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
+	init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size);
 	while (tree_entry(&desc, &entry)) {
 		switch (object_type(entry.mode)) {
 		case OBJ_TREE:
@@ -189,7 +189,7 @@ static void add_children_by_path(struct repository *r,
 	if (parse_tree_gently(tree, 1) < 0)
 		return;
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
+	init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size);
 	while (tree_entry(&desc, &entry)) {
 		switch (object_type(entry.mode)) {
 		case OBJ_TREE:
diff --git a/tree-walk.c b/tree-walk.c
index 3af50a01c2c7..0b44ec7c75ff 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -15,7 +15,7 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l
 	const char *path;
 	unsigned int len;
 	uint16_t mode;
-	const unsigned hashsz = the_hash_algo->rawsz;
+	const unsigned hashsz = desc->algo->rawsz;
 
 	if (size < hashsz + 3 || buf[size - (hashsz + 1)]) {
 		strbuf_addstr(err, _("too-short tree object"));
@@ -37,15 +37,19 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l
 	desc->entry.path = path;
 	desc->entry.mode = (desc->flags & TREE_DESC_RAW_MODES) ? mode : canon_mode(mode);
 	desc->entry.pathlen = len - 1;
-	oidread(&desc->entry.oid, (const unsigned char *)path + len);
+	oidread_algop(&desc->entry.oid, (const unsigned char *)path + len,
+		      desc->algo);
 
 	return 0;
 }
 
-static int init_tree_desc_internal(struct tree_desc *desc, const void *buffer,
-				   unsigned long size, struct strbuf *err,
+static int init_tree_desc_internal(struct tree_desc *desc,
+				   const struct object_id *oid,
+				   const void *buffer, unsigned long size,
+				   struct strbuf *err,
 				   enum tree_desc_flags flags)
 {
+	desc->algo = (oid && oid->algo) ? &hash_algos[oid->algo] : the_hash_algo;
 	desc->buffer = buffer;
 	desc->size = size;
 	desc->flags = flags;
@@ -54,19 +58,21 @@ static int init_tree_desc_internal(struct tree_desc *desc, const void *buffer,
 	return 0;
 }
 
-void init_tree_desc(struct tree_desc *desc, const void *buffer, unsigned long size)
+void init_tree_desc(struct tree_desc *desc, const struct object_id *tree_oid,
+		    const void *buffer, unsigned long size)
 {
 	struct strbuf err = STRBUF_INIT;
-	if (init_tree_desc_internal(desc, buffer, size, &err, 0))
+	if (init_tree_desc_internal(desc, tree_oid, buffer, size, &err, 0))
 		die("%s", err.buf);
 	strbuf_release(&err);
 }
 
-int init_tree_desc_gently(struct tree_desc *desc, const void *buffer, unsigned long size,
+int init_tree_desc_gently(struct tree_desc *desc, const struct object_id *oid,
+			  const void *buffer, unsigned long size,
 			  enum tree_desc_flags flags)
 {
 	struct strbuf err = STRBUF_INIT;
-	int result = init_tree_desc_internal(desc, buffer, size, &err, flags);
+	int result = init_tree_desc_internal(desc, oid, buffer, size, &err, flags);
 	if (result)
 		error("%s", err.buf);
 	strbuf_release(&err);
@@ -85,7 +91,7 @@ void *fill_tree_descriptor(struct repository *r,
 		if (!buf)
 			die("unable to read tree %s", oid_to_hex(oid));
 	}
-	init_tree_desc(desc, buf, size);
+	init_tree_desc(desc, oid, buf, size);
 	return buf;
 }
 
@@ -102,7 +108,7 @@ static void entry_extract(struct tree_desc *t, struct name_entry *a)
 static int update_tree_entry_internal(struct tree_desc *desc, struct strbuf *err)
 {
 	const void *buf = desc->buffer;
-	const unsigned char *end = (const unsigned char *)desc->entry.path + desc->entry.pathlen + 1 + the_hash_algo->rawsz;
+	const unsigned char *end = (const unsigned char *)desc->entry.path + desc->entry.pathlen + 1 + desc->algo->rawsz;
 	unsigned long size = desc->size;
 	unsigned long len = end - (const unsigned char *)buf;
 
@@ -611,7 +617,7 @@ int get_tree_entry(struct repository *r,
 		retval = -1;
 	} else {
 		struct tree_desc t;
-		init_tree_desc(&t, tree, size);
+		init_tree_desc(&t, tree_oid, tree, size);
 		retval = find_tree_entry(r, &t, name, oid, mode);
 	}
 	free(tree);
@@ -654,7 +660,7 @@ enum get_oid_result get_tree_entry_follow_symlinks(struct repository *r,
 	struct tree_desc t;
 	int follows_remaining = GET_TREE_ENTRY_FOLLOW_SYMLINKS_MAX_LINKS;
 
-	init_tree_desc(&t, NULL, 0UL);
+	init_tree_desc(&t, NULL, NULL, 0UL);
 	strbuf_addstr(&namebuf, name);
 	oidcpy(&current_tree_oid, tree_oid);
 
@@ -690,7 +696,7 @@ enum get_oid_result get_tree_entry_follow_symlinks(struct repository *r,
 				goto done;
 
 			/* descend */
-			init_tree_desc(&t, tree, size);
+			init_tree_desc(&t, &current_tree_oid, tree, size);
 		}
 
 		/* Handle symlinks to e.g. a//b by removing leading slashes */
@@ -724,7 +730,7 @@ enum get_oid_result get_tree_entry_follow_symlinks(struct repository *r,
 			free(parent->tree);
 			parents_nr--;
 			parent = &parents[parents_nr - 1];
-			init_tree_desc(&t, parent->tree, parent->size);
+			init_tree_desc(&t, &parent->oid, parent->tree, parent->size);
 			strbuf_remove(&namebuf, 0, remainder ? 3 : 2);
 			continue;
 		}
@@ -804,7 +810,7 @@ enum get_oid_result get_tree_entry_follow_symlinks(struct repository *r,
 			contents_start = contents;
 
 			parent = &parents[parents_nr - 1];
-			init_tree_desc(&t, parent->tree, parent->size);
+			init_tree_desc(&t, &parent->oid, parent->tree, parent->size);
 			strbuf_splice(&namebuf, 0, len,
 				      contents_start, link_len);
 			if (remainder)
diff --git a/tree-walk.h b/tree-walk.h
index 74cdceb3fed2..cf54d01019e9 100644
--- a/tree-walk.h
+++ b/tree-walk.h
@@ -26,6 +26,7 @@ struct name_entry {
  * A semi-opaque data structure used to maintain the current state of the walk.
  */
 struct tree_desc {
+	const struct git_hash_algo *algo;
 	/*
 	 * pointer into the memory representation of the tree. It always
 	 * points at the current entry being visited.
@@ -85,9 +86,11 @@ int update_tree_entry_gently(struct tree_desc *);
  * size parameters are assumed to be the same as the buffer and size
  * members of `struct tree`.
  */
-void init_tree_desc(struct tree_desc *desc, const void *buf, unsigned long size);
+void init_tree_desc(struct tree_desc *desc, const struct object_id *tree_oid,
+		    const void *buf, unsigned long size);
 
-int init_tree_desc_gently(struct tree_desc *desc, const void *buf, unsigned long size,
+int init_tree_desc_gently(struct tree_desc *desc, const struct object_id *oid,
+			  const void *buf, unsigned long size,
 			  enum tree_desc_flags flags);
 
 /*
diff --git a/tree.c b/tree.c
index c745462f968e..44bcf728f10a 100644
--- a/tree.c
+++ b/tree.c
@@ -27,7 +27,7 @@ int read_tree_at(struct repository *r,
 	if (parse_tree(tree))
 		return -1;
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
+	init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size);
 
 	while (tree_entry(&desc, &entry)) {
 		if (retval != all_entries_interesting) {
diff --git a/walker.c b/walker.c
index 65002a7220ad..c0fd632d921c 100644
--- a/walker.c
+++ b/walker.c
@@ -45,7 +45,7 @@ static int process_tree(struct walker *walker, struct tree *tree)
 	if (parse_tree(tree))
 		return -1;
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
+	init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size);
 	while (tree_entry(&desc, &entry)) {
 		struct object *obj = NULL;
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH 25/30] object-file: Handle compat objects in check_object_signature
  2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
                   ` (23 preceding siblings ...)
  2023-09-27 19:55 ` [PATCH 24/30] tree-walk: init_tree_desc take an oid to get the hash algorithm Eric W. Biederman
@ 2023-09-27 19:55 ` Eric W. Biederman
  2023-09-27 19:55 ` [PATCH 26/30] builtin/ls-tree: Let the oid determine the output algorithm Eric W. Biederman
                   ` (6 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-09-27 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

Update check_object_signature to find the hash algorithm the exising
signature uses, and to use the same hash algorithm when recomputing it
to check the signature is valid.

This will be useful when teaching git ls-tree to display objects
encoded with the compat hash algorithm.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 object-file.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/object-file.c b/object-file.c
index 1601d624c9fd..df49d2239f24 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1094,9 +1094,11 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 			   void *buf, unsigned long size,
 			   enum object_type type)
 {
+	const struct git_hash_algo *algo =
+		oid->algo ? &hash_algos[oid->algo] : r->hash_algo;
 	struct object_id real_oid;
 
-	hash_object_file(r->hash_algo, buf, size, type, &real_oid);
+	hash_object_file(algo, buf, size, type, &real_oid);
 
 	return !oideq(oid, &real_oid) ? -1 : 0;
 }
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH 26/30] builtin/ls-tree: Let the oid determine the output algorithm
  2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
                   ` (24 preceding siblings ...)
  2023-09-27 19:55 ` [PATCH 25/30] object-file: Handle compat objects in check_object_signature Eric W. Biederman
@ 2023-09-27 19:55 ` Eric W. Biederman
  2023-09-27 19:55 ` [PATCH 27/30] test-lib: Compute the compatibility hash so tests may use it Eric W. Biederman
                   ` (5 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-09-27 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

Update cmd_ls_tree to call get_oid_with_context and pass
GET_OID_HASH_ANY instead of calling the simpler repo_get_oid.

This implments in ls-tree the behavior that asking to display a sha1
hash displays the corrresponding sha1 encoded object and asking to
display a sha256 hash displayes the corresponding sha256 encoded
object.

This is useful for testing the conversion of an object to an
equivlanet object encoded with a different hash function.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 builtin/ls-tree.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/builtin/ls-tree.c b/builtin/ls-tree.c
index f558db5f3b80..71281ab705b6 100644
--- a/builtin/ls-tree.c
+++ b/builtin/ls-tree.c
@@ -376,6 +376,7 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 		OPT_END()
 	};
 	struct ls_tree_cmdmode_to_fmt *m2f = ls_tree_cmdmode_format;
+	struct object_context obj_context;
 	int ret;
 
 	git_config(git_default_config, NULL);
@@ -407,7 +408,9 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 			ls_tree_usage, ls_tree_options);
 	if (argc < 1)
 		usage_with_options(ls_tree_usage, ls_tree_options);
-	if (repo_get_oid(the_repository, argv[0], &oid))
+	if (get_oid_with_context(the_repository, argv[0],
+				 GET_OID_HASH_ANY, &oid,
+				 &obj_context))
 		die("Not a valid object name %s", argv[0]);
 
 	/*
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH 27/30] test-lib: Compute the compatibility hash so tests may use it
  2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
                   ` (25 preceding siblings ...)
  2023-09-27 19:55 ` [PATCH 26/30] builtin/ls-tree: Let the oid determine the output algorithm Eric W. Biederman
@ 2023-09-27 19:55 ` Eric W. Biederman
  2023-09-27 19:55 ` [PATCH 28/30] t1006: Rename sha1 to oid Eric W. Biederman
                   ` (4 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-09-27 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

Inspired-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 t/test-lib-functions.sh | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 2f8868caa171..92b462e2e711 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1599,7 +1599,16 @@ test_set_hash () {
 
 # Detect the hash algorithm in use.
 test_detect_hash () {
-	test_hash_algo="${GIT_TEST_DEFAULT_HASH:-sha1}"
+	case "$GIT_TEST_DEFAULT_HASH" in
+	"sha256")
+	    test_hash_algo=sha256
+	    test_compat_hash_algo=sha1
+	    ;;
+	*)
+	    test_hash_algo=sha1
+	    test_compat_hash_algo=sha256
+	    ;;
+	esac
 }
 
 # Load common hash metadata and common placeholder object IDs for use with
@@ -1651,6 +1660,12 @@ test_oid () {
 	local algo="${test_hash_algo}" &&
 
 	case "$1" in
+	--hash=storage)
+		algo="$test_hash_algo" &&
+		shift;;
+	--hash=compat)
+		algo="$test_compat_hash_algo" &&
+		shift;;
 	--hash=*)
 		algo="${1#--hash=}" &&
 		shift;;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH 28/30] t1006: Rename sha1 to oid
  2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
                   ` (26 preceding siblings ...)
  2023-09-27 19:55 ` [PATCH 27/30] test-lib: Compute the compatibility hash so tests may use it Eric W. Biederman
@ 2023-09-27 19:55 ` Eric W. Biederman
  2023-09-27 19:55 ` [PATCH 29/30] t1006: Test oid compatibility with cat-file Eric W. Biederman
                   ` (3 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-09-27 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

Before I extend this test, changing the naming of the relevant
hash from sha1 to oid.  Calling the hash sha1 is incorrect today
as it can be either sha1 or sha256 depending on the value of
GIT_DEFAULT_HASH_FUNCTION when the test is called.

I plan to test sha1 and sha256 simultaneously in the same repository.
Having a name like sha1 will be even more confusing.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 t/t1006-cat-file.sh | 220 ++++++++++++++++++++++----------------------
 1 file changed, 110 insertions(+), 110 deletions(-)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index d73a0be1b9d1..9b018b538950 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -112,65 +112,65 @@ strlen () {
 
 run_tests () {
     type=$1
-    sha1=$2
+    oid=$2
     size=$3
     content=$4
     pretty_content=$5
 
-    batch_output="$sha1 $type $size
+    batch_output="$oid $type $size
 $content"
 
     test_expect_success "$type exists" '
-	git cat-file -e $sha1
+	git cat-file -e $oid
     '
 
     test_expect_success "Type of $type is correct" '
 	echo $type >expect &&
-	git cat-file -t $sha1 >actual &&
+	git cat-file -t $oid >actual &&
 	test_cmp expect actual
     '
 
     test_expect_success "Size of $type is correct" '
 	echo $size >expect &&
-	git cat-file -s $sha1 >actual &&
+	git cat-file -s $oid >actual &&
 	test_cmp expect actual
     '
 
     test_expect_success "Type of $type is correct using --allow-unknown-type" '
 	echo $type >expect &&
-	git cat-file -t --allow-unknown-type $sha1 >actual &&
+	git cat-file -t --allow-unknown-type $oid >actual &&
 	test_cmp expect actual
     '
 
     test_expect_success "Size of $type is correct using --allow-unknown-type" '
 	echo $size >expect &&
-	git cat-file -s --allow-unknown-type $sha1 >actual &&
+	git cat-file -s --allow-unknown-type $oid >actual &&
 	test_cmp expect actual
     '
 
     test -z "$content" ||
     test_expect_success "Content of $type is correct" '
 	echo_without_newline "$content" >expect &&
-	git cat-file $type $sha1 >actual &&
+	git cat-file $type $oid >actual &&
 	test_cmp expect actual
     '
 
     test_expect_success "Pretty content of $type is correct" '
 	echo_without_newline "$pretty_content" >expect &&
-	git cat-file -p $sha1 >actual &&
+	git cat-file -p $oid >actual &&
 	test_cmp expect actual
     '
 
     test -z "$content" ||
     test_expect_success "--batch output of $type is correct" '
 	echo "$batch_output" >expect &&
-	echo $sha1 | git cat-file --batch >actual &&
+	echo $oid | git cat-file --batch >actual &&
 	test_cmp expect actual
     '
 
     test_expect_success "--batch-check output of $type is correct" '
-	echo "$sha1 $type $size" >expect &&
-	echo_without_newline $sha1 | git cat-file --batch-check >actual &&
+	echo "$oid $type $size" >expect &&
+	echo_without_newline $oid | git cat-file --batch-check >actual &&
 	test_cmp expect actual
     '
 
@@ -179,33 +179,33 @@ $content"
 	test -z "$content" ||
 		test_expect_success "--batch-command $opt output of $type content is correct" '
 		echo "$batch_output" >expect &&
-		test_write_lines "contents $sha1" | git cat-file --batch-command $opt >actual &&
+		test_write_lines "contents $oid" | git cat-file --batch-command $opt >actual &&
 		test_cmp expect actual
 	'
 
 	test_expect_success "--batch-command $opt output of $type info is correct" '
-		echo "$sha1 $type $size" >expect &&
-		test_write_lines "info $sha1" |
+		echo "$oid $type $size" >expect &&
+		test_write_lines "info $oid" |
 		git cat-file --batch-command $opt >actual &&
 		test_cmp expect actual
 	'
     done
 
     test_expect_success "custom --batch-check format" '
-	echo "$type $sha1" >expect &&
-	echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual &&
+	echo "$type $oid" >expect &&
+	echo $oid | git cat-file --batch-check="%(objecttype) %(objectname)" >actual &&
 	test_cmp expect actual
     '
 
     test_expect_success "custom --batch-command format" '
-	echo "$type $sha1" >expect &&
-	echo "info $sha1" | git cat-file --batch-command="%(objecttype) %(objectname)" >actual &&
+	echo "$type $oid" >expect &&
+	echo "info $oid" | git cat-file --batch-command="%(objecttype) %(objectname)" >actual &&
 	test_cmp expect actual
     '
 
     test_expect_success '--batch-check with %(rest)' '
 	echo "$type this is some extra content" >expect &&
-	echo "$sha1    this is some extra content" |
+	echo "$oid    this is some extra content" |
 		git cat-file --batch-check="%(objecttype) %(rest)" >actual &&
 	test_cmp expect actual
     '
@@ -216,7 +216,7 @@ $content"
 		echo "$size" &&
 		echo "$content"
 	} >expect &&
-	echo $sha1 | git cat-file --batch="%(objectsize)" >actual &&
+	echo $oid | git cat-file --batch="%(objectsize)" >actual &&
 	test_cmp expect actual
     '
 
@@ -226,25 +226,25 @@ $content"
 		echo "$type" &&
 		echo "$content"
 	} >expect &&
-	echo $sha1 | git cat-file --batch="%(objecttype)" >actual &&
+	echo $oid | git cat-file --batch="%(objecttype)" >actual &&
 	test_cmp expect actual
     '
 }
 
 hello_content="Hello World"
 hello_size=$(strlen "$hello_content")
-hello_sha1=$(echo_without_newline "$hello_content" | git hash-object --stdin)
+hello_oid=$(echo_without_newline "$hello_content" | git hash-object --stdin)
 
 test_expect_success "setup" '
 	echo_without_newline "$hello_content" > hello &&
 	git update-index --add hello
 '
 
-run_tests 'blob' $hello_sha1 $hello_size "$hello_content" "$hello_content"
+run_tests 'blob' $hello_oid $hello_size "$hello_content" "$hello_content"
 
 test_expect_success '--batch-command --buffer with flush for blob info' '
-	echo "$hello_sha1 blob $hello_size" >expect &&
-	test_write_lines "info $hello_sha1" "flush" |
+	echo "$hello_oid blob $hello_size" >expect &&
+	test_write_lines "info $hello_oid" "flush" |
 	GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=1 \
 	git cat-file --batch-command --buffer >actual &&
 	test_cmp expect actual
@@ -252,38 +252,38 @@ test_expect_success '--batch-command --buffer with flush for blob info' '
 
 test_expect_success '--batch-command --buffer without flush for blob info' '
 	touch output &&
-	test_write_lines "info $hello_sha1" |
+	test_write_lines "info $hello_oid" |
 	GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=1 \
 	git cat-file --batch-command --buffer >>output &&
 	test_must_be_empty output
 '
 
 test_expect_success '--batch-check without %(rest) considers whole line' '
-	echo "$hello_sha1 blob $hello_size" >expect &&
-	git update-index --add --cacheinfo 100644 $hello_sha1 "white space" &&
+	echo "$hello_oid blob $hello_size" >expect &&
+	git update-index --add --cacheinfo 100644 $hello_oid "white space" &&
 	test_when_finished "git update-index --remove \"white space\"" &&
 	echo ":white space" | git cat-file --batch-check >actual &&
 	test_cmp expect actual
 '
 
-tree_sha1=$(git write-tree)
+tree_oid=$(git write-tree)
 tree_size=$(($(test_oid rawsz) + 13))
-tree_pretty_content="100644 blob $hello_sha1	hello${LF}"
+tree_pretty_content="100644 blob $hello_oid	hello${LF}"
 
-run_tests 'tree' $tree_sha1 $tree_size "" "$tree_pretty_content"
+run_tests 'tree' $tree_oid $tree_size "" "$tree_pretty_content"
 
 commit_message="Initial commit"
-commit_sha1=$(echo_without_newline "$commit_message" | git commit-tree $tree_sha1)
+commit_oid=$(echo_without_newline "$commit_message" | git commit-tree $tree_oid)
 commit_size=$(($(test_oid hexsz) + 137))
-commit_content="tree $tree_sha1
+commit_content="tree $tree_oid
 author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> $GIT_AUTHOR_DATE
 committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
 
 $commit_message"
 
-run_tests 'commit' $commit_sha1 $commit_size "$commit_content" "$commit_content"
+run_tests 'commit' $commit_oid $commit_size "$commit_content" "$commit_content"
 
-tag_header_without_timestamp="object $hello_sha1
+tag_header_without_timestamp="object $hello_oid
 type blob
 tag hellotag
 tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL>"
@@ -292,14 +292,14 @@ tag_content="$tag_header_without_timestamp 0 +0000
 
 $tag_description"
 
-tag_sha1=$(echo_without_newline "$tag_content" | git hash-object -t tag --stdin -w)
+tag_oid=$(echo_without_newline "$tag_content" | git hash-object -t tag --stdin -w)
 tag_size=$(strlen "$tag_content")
 
-run_tests 'tag' $tag_sha1 $tag_size "$tag_content" "$tag_content"
+run_tests 'tag' $tag_oid $tag_size "$tag_content" "$tag_content"
 
 test_expect_success "Reach a blob from a tag pointing to it" '
 	echo_without_newline "$hello_content" >expect &&
-	git cat-file blob $tag_sha1 >actual &&
+	git cat-file blob $tag_oid >actual &&
 	test_cmp expect actual
 '
 
@@ -308,31 +308,31 @@ do
     for opt in t s e p
     do
 	test_expect_success "Passing -$opt with --$batch fails" '
-	    test_must_fail git cat-file --$batch -$opt $hello_sha1
+	    test_must_fail git cat-file --$batch -$opt $hello_oid
 	'
 
 	test_expect_success "Passing --$batch with -$opt fails" '
-	    test_must_fail git cat-file -$opt --$batch $hello_sha1
+	    test_must_fail git cat-file -$opt --$batch $hello_oid
 	'
     done
 
     test_expect_success "Passing <type> with --$batch fails" '
-	test_must_fail git cat-file --$batch blob $hello_sha1
+	test_must_fail git cat-file --$batch blob $hello_oid
     '
 
     test_expect_success "Passing --$batch with <type> fails" '
-	test_must_fail git cat-file blob --$batch $hello_sha1
+	test_must_fail git cat-file blob --$batch $hello_oid
     '
 
-    test_expect_success "Passing sha1 with --$batch fails" '
-	test_must_fail git cat-file --$batch $hello_sha1
+    test_expect_success "Passing oid with --$batch fails" '
+	test_must_fail git cat-file --$batch $hello_oid
     '
 done
 
 for opt in t s e p
 do
     test_expect_success "Passing -$opt with --follow-symlinks fails" '
-	    test_must_fail git cat-file --follow-symlinks -$opt $hello_sha1
+	    test_must_fail git cat-file --follow-symlinks -$opt $hello_oid
 	'
 done
 
@@ -360,12 +360,12 @@ test_expect_success "--batch-check for a non-existent hash" '
 
 test_expect_success "--batch for an existent and a non-existent hash" '
 	cat >expect <<-EOF &&
-	$tag_sha1 tag $tag_size
+	$tag_oid tag $tag_size
 	$tag_content
 	0000000000000000000000000000000000000000 missing
 	EOF
 
-	printf "$tag_sha1\n0000000000000000000000000000000000000000" >in &&
+	printf "$tag_oid\n0000000000000000000000000000000000000000" >in &&
 	git cat-file --batch <in >actual &&
 	test_cmp expect actual
 '
@@ -386,74 +386,74 @@ test_expect_success 'empty --batch-check notices missing object' '
 	test_cmp expect actual
 '
 
-batch_input="$hello_sha1
-$commit_sha1
-$tag_sha1
+batch_input="$hello_oid
+$commit_oid
+$tag_oid
 deadbeef
 
 "
 
 printf "%s\0" \
-	"$hello_sha1 blob $hello_size" \
+	"$hello_oid blob $hello_size" \
 	"$hello_content" \
-	"$commit_sha1 commit $commit_size" \
+	"$commit_oid commit $commit_size" \
 	"$commit_content" \
-	"$tag_sha1 tag $tag_size" \
+	"$tag_oid tag $tag_size" \
 	"$tag_content" \
 	"deadbeef missing" \
 	" missing" >batch_output
 
-test_expect_success '--batch with multiple sha1s gives correct format' '
+test_expect_success '--batch with multiple oids gives correct format' '
 	tr "\0" "\n" <batch_output >expect &&
 	echo_without_newline "$batch_input" >in &&
 	git cat-file --batch <in >actual &&
 	test_cmp expect actual
 '
 
-test_expect_success '--batch, -z with multiple sha1s gives correct format' '
+test_expect_success '--batch, -z with multiple oids gives correct format' '
 	echo_without_newline_nul "$batch_input" >in &&
 	tr "\0" "\n" <batch_output >expect &&
 	git cat-file --batch -z <in >actual &&
 	test_cmp expect actual
 '
 
-test_expect_success '--batch, -Z with multiple sha1s gives correct format' '
+test_expect_success '--batch, -Z with multiple oids gives correct format' '
 	echo_without_newline_nul "$batch_input" >in &&
 	git cat-file --batch -Z <in >actual &&
 	test_cmp batch_output actual
 '
 
-batch_check_input="$hello_sha1
-$tree_sha1
-$commit_sha1
-$tag_sha1
+batch_check_input="$hello_oid
+$tree_oid
+$commit_oid
+$tag_oid
 deadbeef
 
 "
 
 printf "%s\0" \
-	"$hello_sha1 blob $hello_size" \
-	"$tree_sha1 tree $tree_size" \
-	"$commit_sha1 commit $commit_size" \
-	"$tag_sha1 tag $tag_size" \
+	"$hello_oid blob $hello_size" \
+	"$tree_oid tree $tree_size" \
+	"$commit_oid commit $commit_size" \
+	"$tag_oid tag $tag_size" \
 	"deadbeef missing" \
 	" missing" >batch_check_output
 
-test_expect_success "--batch-check with multiple sha1s gives correct format" '
+test_expect_success "--batch-check with multiple oids gives correct format" '
 	tr "\0" "\n" <batch_check_output >expect &&
 	echo_without_newline "$batch_check_input" >in &&
 	git cat-file --batch-check <in >actual &&
 	test_cmp expect actual
 '
 
-test_expect_success "--batch-check, -z with multiple sha1s gives correct format" '
+test_expect_success "--batch-check, -z with multiple oids gives correct format" '
 	tr "\0" "\n" <batch_check_output >expect &&
 	echo_without_newline_nul "$batch_check_input" >in &&
 	git cat-file --batch-check -z <in >actual &&
 	test_cmp expect actual
 '
 
-test_expect_success "--batch-check, -Z with multiple sha1s gives correct format" '
+test_expect_success "--batch-check, -Z with multiple oids gives correct format" '
 	echo_without_newline_nul "$batch_check_input" >in &&
 	git cat-file --batch-check -Z <in >actual &&
 	test_cmp batch_check_output actual
@@ -480,18 +480,18 @@ test_expect_success FUNNYNAMES '--batch-check, -Z with newline in input' '
 	test_cmp expect actual
 '
 
-batch_command_multiple_info="info $hello_sha1
-info $tree_sha1
-info $commit_sha1
-info $tag_sha1
+batch_command_multiple_info="info $hello_oid
+info $tree_oid
+info $commit_oid
+info $tag_oid
 info deadbeef"
 
 test_expect_success '--batch-command with multiple info calls gives correct format' '
 	cat >expect <<-EOF &&
-	$hello_sha1 blob $hello_size
-	$tree_sha1 tree $tree_size
-	$commit_sha1 commit $commit_size
-	$tag_sha1 tag $tag_size
+	$hello_oid blob $hello_size
+	$tree_oid tree $tree_size
+	$commit_oid commit $commit_size
+	$tag_oid tag $tag_size
 	deadbeef missing
 	EOF
 
@@ -512,19 +512,19 @@ test_expect_success '--batch-command with multiple info calls gives correct form
 	test_cmp expect_nul actual
 '
 
-batch_command_multiple_contents="contents $hello_sha1
-contents $commit_sha1
-contents $tag_sha1
+batch_command_multiple_contents="contents $hello_oid
+contents $commit_oid
+contents $tag_oid
 contents deadbeef
 flush"
 
 test_expect_success '--batch-command with multiple command calls gives correct format' '
 	printf "%s\0" \
-		"$hello_sha1 blob $hello_size" \
+		"$hello_oid blob $hello_size" \
 		"$hello_content" \
-		"$commit_sha1 commit $commit_size" \
+		"$commit_oid commit $commit_size" \
 		"$commit_content" \
-		"$tag_sha1 tag $tag_size" \
+		"$tag_oid tag $tag_size" \
 		"$tag_content" \
 		"deadbeef missing" >expect_nul &&
 	tr "\0" "\n" <expect_nul >expect &&
@@ -569,7 +569,7 @@ test_expect_success 'confirm that neither loose blob is a delta' '
 # we will check only that one of the two objects is a delta
 # against the other, but not the order. We can do so by just
 # asking for the base of both, and checking whether either
-# sha1 appears in the output.
+# oid appears in the output.
 test_expect_success '%(deltabase) reports packed delta bases' '
 	git repack -ad &&
 	git cat-file --batch-check="%(deltabase)" <blobs >actual &&
@@ -583,12 +583,12 @@ test_expect_success 'setup bogus data' '
 	bogus_short_type="bogus" &&
 	bogus_short_content="bogus" &&
 	bogus_short_size=$(strlen "$bogus_short_content") &&
-	bogus_short_sha1=$(echo_without_newline "$bogus_short_content" | git hash-object -t $bogus_short_type --literally -w --stdin) &&
+	bogus_short_oid=$(echo_without_newline "$bogus_short_content" | git hash-object -t $bogus_short_type --literally -w --stdin) &&
 
 	bogus_long_type="abcdefghijklmnopqrstuvwxyz1234679" &&
 	bogus_long_content="bogus" &&
 	bogus_long_size=$(strlen "$bogus_long_content") &&
-	bogus_long_sha1=$(echo_without_newline "$bogus_long_content" | git hash-object -t $bogus_long_type --literally -w --stdin)
+	bogus_long_oid=$(echo_without_newline "$bogus_long_content" | git hash-object -t $bogus_long_type --literally -w --stdin)
 '
 
 for arg1 in '' --allow-unknown-type
@@ -608,9 +608,9 @@ do
 
 			if test "$arg1" = "--allow-unknown-type"
 			then
-				git cat-file $arg1 $arg2 $bogus_short_sha1
+				git cat-file $arg1 $arg2 $bogus_short_oid
 			else
-				test_must_fail git cat-file $arg1 $arg2 $bogus_short_sha1 >out 2>actual &&
+				test_must_fail git cat-file $arg1 $arg2 $bogus_short_oid >out 2>actual &&
 				test_must_be_empty out &&
 				test_cmp expect actual
 			fi
@@ -620,21 +620,21 @@ do
 			if test "$arg2" = "-p"
 			then
 				cat >expect <<-EOF
-				error: header for $bogus_long_sha1 too long, exceeds 32 bytes
-				fatal: Not a valid object name $bogus_long_sha1
+				error: header for $bogus_long_oid too long, exceeds 32 bytes
+				fatal: Not a valid object name $bogus_long_oid
 				EOF
 			else
 				cat >expect <<-EOF
-				error: header for $bogus_long_sha1 too long, exceeds 32 bytes
+				error: header for $bogus_long_oid too long, exceeds 32 bytes
 				fatal: git cat-file: could not get object info
 				EOF
 			fi &&
 
 			if test "$arg1" = "--allow-unknown-type"
 			then
-				git cat-file $arg1 $arg2 $bogus_short_sha1
+				git cat-file $arg1 $arg2 $bogus_short_oid
 			else
-				test_must_fail git cat-file $arg1 $arg2 $bogus_long_sha1 >out 2>actual &&
+				test_must_fail git cat-file $arg1 $arg2 $bogus_long_oid >out 2>actual &&
 				test_must_be_empty out &&
 				test_cmp expect actual
 			fi
@@ -668,28 +668,28 @@ do
 done
 
 test_expect_success '-e is OK with a broken object without --allow-unknown-type' '
-	git cat-file -e $bogus_short_sha1
+	git cat-file -e $bogus_short_oid
 '
 
 test_expect_success '-e can not be combined with --allow-unknown-type' '
-	test_expect_code 128 git cat-file -e --allow-unknown-type $bogus_short_sha1
+	test_expect_code 128 git cat-file -e --allow-unknown-type $bogus_short_oid
 '
 
 test_expect_success '-p cannot print a broken object even with --allow-unknown-type' '
-	test_must_fail git cat-file -p $bogus_short_sha1 &&
-	test_expect_code 128 git cat-file -p --allow-unknown-type $bogus_short_sha1
+	test_must_fail git cat-file -p $bogus_short_oid &&
+	test_expect_code 128 git cat-file -p --allow-unknown-type $bogus_short_oid
 '
 
 test_expect_success '<type> <hash> does not work with objects of broken types' '
 	cat >err.expect <<-\EOF &&
 	fatal: invalid object type "bogus"
 	EOF
-	test_must_fail git cat-file $bogus_short_type $bogus_short_sha1 2>err.actual &&
+	test_must_fail git cat-file $bogus_short_type $bogus_short_oid 2>err.actual &&
 	test_cmp err.expect err.actual
 '
 
 test_expect_success 'broken types combined with --batch and --batch-check' '
-	echo $bogus_short_sha1 >bogus-oid &&
+	echo $bogus_short_oid >bogus-oid &&
 
 	cat >err.expect <<-\EOF &&
 	fatal: invalid object type
@@ -711,52 +711,52 @@ test_expect_success 'the --allow-unknown-type option does not consider replaceme
 	cat >expect <<-EOF &&
 	$bogus_short_type
 	EOF
-	git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
+	git cat-file -t --allow-unknown-type $bogus_short_oid >actual &&
 	test_cmp expect actual &&
 
 	# Create it manually, as "git replace" will die on bogus
 	# types.
 	head=$(git rev-parse --verify HEAD) &&
-	test_when_finished "test-tool ref-store main delete-refs 0 msg refs/replace/$bogus_short_sha1" &&
-	test-tool ref-store main update-ref msg "refs/replace/$bogus_short_sha1" $head $ZERO_OID REF_SKIP_OID_VERIFICATION &&
+	test_when_finished "test-tool ref-store main delete-refs 0 msg refs/replace/$bogus_short_oid" &&
+	test-tool ref-store main update-ref msg "refs/replace/$bogus_short_oid" $head $ZERO_OID REF_SKIP_OID_VERIFICATION &&
 
 	cat >expect <<-EOF &&
 	commit
 	EOF
-	git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
+	git cat-file -t --allow-unknown-type $bogus_short_oid >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success "Type of broken object is correct" '
 	echo $bogus_short_type >expect &&
-	git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
+	git cat-file -t --allow-unknown-type $bogus_short_oid >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success "Size of broken object is correct" '
 	echo $bogus_short_size >expect &&
-	git cat-file -s --allow-unknown-type $bogus_short_sha1 >actual &&
+	git cat-file -s --allow-unknown-type $bogus_short_oid >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success 'clean up broken object' '
-	rm .git/objects/$(test_oid_to_path $bogus_short_sha1)
+	rm .git/objects/$(test_oid_to_path $bogus_short_oid)
 '
 
 test_expect_success "Type of broken object is correct when type is large" '
 	echo $bogus_long_type >expect &&
-	git cat-file -t --allow-unknown-type $bogus_long_sha1 >actual &&
+	git cat-file -t --allow-unknown-type $bogus_long_oid >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success "Size of large broken object is correct when type is large" '
 	echo $bogus_long_size >expect &&
-	git cat-file -s --allow-unknown-type $bogus_long_sha1 >actual &&
+	git cat-file -s --allow-unknown-type $bogus_long_oid >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success 'clean up broken object' '
-	rm .git/objects/$(test_oid_to_path $bogus_long_sha1)
+	rm .git/objects/$(test_oid_to_path $bogus_long_oid)
 '
 
 test_expect_success 'cat-file -t and -s on corrupt loose object' '
@@ -853,7 +853,7 @@ test_expect_success 'prep for symlink tests' '
 	test_ln_s_add loop2 loop1 &&
 	git add morx dir/subdir/ind2 dir/ind1 &&
 	git commit -am "test" &&
-	echo $hello_sha1 blob $hello_size >found
+	echo $hello_oid blob $hello_size >found
 '
 
 test_expect_success 'git cat-file --batch-check --follow-symlinks works for non-links' '
@@ -941,7 +941,7 @@ test_expect_success 'git cat-file --batch-check --follow-symlinks works for dir/
 	echo HEAD:dirlink/morx >>expect &&
 	echo HEAD:dirlink/morx | git cat-file --batch-check --follow-symlinks >actual &&
 	test_cmp expect actual &&
-	echo $hello_sha1 blob $hello_size >expect &&
+	echo $hello_oid blob $hello_size >expect &&
 	echo HEAD:dirlink/ind1 | git cat-file --batch-check --follow-symlinks >actual &&
 	test_cmp expect actual
 '
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH 29/30] t1006: Test oid compatibility with cat-file
  2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
                   ` (27 preceding siblings ...)
  2023-09-27 19:55 ` [PATCH 28/30] t1006: Rename sha1 to oid Eric W. Biederman
@ 2023-09-27 19:55 ` Eric W. Biederman
  2023-09-27 19:55 ` [PATCH 30/30] t1016-compatObjectFormat: Add tests to verify the conversion between objects Eric W. Biederman
                   ` (2 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-09-27 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

Update the existing tests that are oid based to test that cat-file
works correctly with the normal oid and the compat_oid.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 t/t1006-cat-file.sh | 251 +++++++++++++++++++++++++++-----------------
 1 file changed, 154 insertions(+), 97 deletions(-)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 9b018b538950..23d3d37283bb 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -236,27 +236,38 @@ hello_size=$(strlen "$hello_content")
 hello_oid=$(echo_without_newline "$hello_content" | git hash-object --stdin)
 
 test_expect_success "setup" '
+	git config core.repositoryformatversion 1 &&
+	git config extensions.objectformat $test_hash_algo &&
+	git config extensions.compatobjectformat $test_compat_hash_algo &&
 	echo_without_newline "$hello_content" > hello &&
 	git update-index --add hello
 '
 
-run_tests 'blob' $hello_oid $hello_size "$hello_content" "$hello_content"
+run_blob_tests () {
+    oid=$1
 
-test_expect_success '--batch-command --buffer with flush for blob info' '
-	echo "$hello_oid blob $hello_size" >expect &&
-	test_write_lines "info $hello_oid" "flush" |
+    run_tests 'blob' $oid $hello_size "$hello_content" "$hello_content"
+
+    test_expect_success '--batch-command --buffer with flush for blob info' '
+	echo "$oid blob $hello_size" >expect &&
+	test_write_lines "info $oid" "flush" |
 	GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=1 \
 	git cat-file --batch-command --buffer >actual &&
 	test_cmp expect actual
-'
+    '
 
-test_expect_success '--batch-command --buffer without flush for blob info' '
+    test_expect_success '--batch-command --buffer without flush for blob info' '
 	touch output &&
-	test_write_lines "info $hello_oid" |
+	test_write_lines "info $oid" |
 	GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=1 \
 	git cat-file --batch-command --buffer >>output &&
 	test_must_be_empty output
-'
+    '
+}
+
+hello_compat_oid=$(git rev-parse --output-object-format=$test_compat_hash_algo $hello_oid)
+run_blob_tests $hello_oid
+run_blob_tests $hello_compat_oid
 
 test_expect_success '--batch-check without %(rest) considers whole line' '
 	echo "$hello_oid blob $hello_size" >expect &&
@@ -267,35 +278,58 @@ test_expect_success '--batch-check without %(rest) considers whole line' '
 '
 
 tree_oid=$(git write-tree)
+tree_compat_oid=$(git rev-parse --output-object-format=$test_compat_hash_algo $tree_oid)
 tree_size=$(($(test_oid rawsz) + 13))
+tree_compat_size=$(($(test_oid --hash=compat rawsz) + 13))
 tree_pretty_content="100644 blob $hello_oid	hello${LF}"
+tree_compat_pretty_content="100644 blob $hello_compat_oid	hello${LF}"
 
 run_tests 'tree' $tree_oid $tree_size "" "$tree_pretty_content"
+run_tests 'tree' $tree_compat_oid $tree_compat_size "" "$tree_compat_pretty_content"
 
 commit_message="Initial commit"
 commit_oid=$(echo_without_newline "$commit_message" | git commit-tree $tree_oid)
+commit_compat_oid=$(git rev-parse --output-object-format=$test_compat_hash_algo $commit_oid)
 commit_size=$(($(test_oid hexsz) + 137))
+commit_compat_size=$(($(test_oid --hash=compat hexsz) + 137))
 commit_content="tree $tree_oid
 author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> $GIT_AUTHOR_DATE
 committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
 
 $commit_message"
 
+commit_compat_content="tree $tree_compat_oid
+author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> $GIT_AUTHOR_DATE
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+
+$commit_message"
+
 run_tests 'commit' $commit_oid $commit_size "$commit_content" "$commit_content"
+run_tests 'commit' $commit_compat_oid $commit_compat_size "$commit_compat_content" "$commit_compat_content"
 
-tag_header_without_timestamp="object $hello_oid
-type blob
+tag_header_without_oid="type blob
 tag hellotag
 tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL>"
+tag_header_without_timestamp="object $hello_oid
+$tag_header_without_oid"
+tag_compat_header_without_timestamp="object $hello_compat_oid
+$tag_header_without_oid"
 tag_description="This is a tag"
 tag_content="$tag_header_without_timestamp 0 +0000
 
+$tag_description"
+tag_compat_content="$tag_compat_header_without_timestamp 0 +0000
+
 $tag_description"
 
 tag_oid=$(echo_without_newline "$tag_content" | git hash-object -t tag --stdin -w)
 tag_size=$(strlen "$tag_content")
 
+tag_compat_oid=$(git rev-parse --output-object-format=$test_compat_hash_algo $tag_oid)
+tag_compat_size=$(strlen "$tag_compat_content")
+
 run_tests 'tag' $tag_oid $tag_size "$tag_content" "$tag_content"
+run_tests 'tag' $tag_compat_oid $tag_compat_size "$tag_compat_content" "$tag_compat_content"
 
 test_expect_success "Reach a blob from a tag pointing to it" '
 	echo_without_newline "$hello_content" >expect &&
@@ -303,37 +337,43 @@ test_expect_success "Reach a blob from a tag pointing to it" '
 	test_cmp expect actual
 '
 
-for batch in batch batch-check batch-command
+for oid in $hello_oid $hello_compat_oid
 do
-    for opt in t s e p
+    for batch in batch batch-check batch-command
     do
+	for opt in t s e p
+	do
 	test_expect_success "Passing -$opt with --$batch fails" '
-	    test_must_fail git cat-file --$batch -$opt $hello_oid
+	    test_must_fail git cat-file --$batch -$opt $oid
 	'
 
 	test_expect_success "Passing --$batch with -$opt fails" '
-	    test_must_fail git cat-file -$opt --$batch $hello_oid
+	    test_must_fail git cat-file -$opt --$batch $oid
 	'
-    done
+	done
 
-    test_expect_success "Passing <type> with --$batch fails" '
-	test_must_fail git cat-file --$batch blob $hello_oid
-    '
+	test_expect_success "Passing <type> with --$batch fails" '
+	test_must_fail git cat-file --$batch blob $oid
+	'
 
-    test_expect_success "Passing --$batch with <type> fails" '
-	test_must_fail git cat-file blob --$batch $hello_oid
-    '
+	test_expect_success "Passing --$batch with <type> fails" '
+	test_must_fail git cat-file blob --$batch $oid
+	'
 
-    test_expect_success "Passing oid with --$batch fails" '
-	test_must_fail git cat-file --$batch $hello_oid
-    '
+	test_expect_success "Passing oid with --$batch fails" '
+	test_must_fail git cat-file --$batch $oid
+	'
+    done
 done
 
-for opt in t s e p
+for oid in $hello_oid $hello_compat_oid
 do
-    test_expect_success "Passing -$opt with --follow-symlinks fails" '
-	    test_must_fail git cat-file --follow-symlinks -$opt $hello_oid
+    for opt in t s e p
+    do
+	test_expect_success "Passing -$opt with --follow-symlinks fails" '
+	    test_must_fail git cat-file --follow-symlinks -$opt $oid
 	'
+    done
 done
 
 test_expect_success "--batch-check for a non-existent named object" '
@@ -386,112 +426,102 @@ test_expect_success 'empty --batch-check notices missing object' '
 	test_cmp expect actual
 '
 
-batch_input="$hello_oid
-$commit_oid
-$tag_oid
+batch_tests () {
+    boid=$1
+    loid=$2
+    lsize=$3
+    coid=$4
+    csize=$5
+    ccontent=$6
+    toid=$7
+    tsize=$8
+    tcontent=$9
+
+    batch_input="$boid
+$coid
+$toid
 deadbeef
 
 "
 
-printf "%s\0" \
-	"$hello_oid blob $hello_size" \
+    printf "%s\0" \
+	"$boid blob $hello_size" \
 	"$hello_content" \
-	"$commit_oid commit $commit_size" \
-	"$commit_content" \
-	"$tag_oid tag $tag_size" \
-	"$tag_content" \
+	"$coid commit $csize" \
+	"$ccontent" \
+	"$toid tag $tsize" \
+	"$tcontent" \
 	"deadbeef missing" \
 	" missing" >batch_output
 
-test_expect_success '--batch with multiple oids gives correct format' '
+    test_expect_success '--batch with multiple oids gives correct format' '
 	tr "\0" "\n" <batch_output >expect &&
 	echo_without_newline "$batch_input" >in &&
 	git cat-file --batch <in >actual &&
 	test_cmp expect actual
-'
+    '
 
-test_expect_success '--batch, -z with multiple oids gives correct format' '
+    test_expect_success '--batch, -z with multiple oids gives correct format' '
 	echo_without_newline_nul "$batch_input" >in &&
 	tr "\0" "\n" <batch_output >expect &&
 	git cat-file --batch -z <in >actual &&
 	test_cmp expect actual
-'
+    '
 
-test_expect_success '--batch, -Z with multiple oids gives correct format' '
+    test_expect_success '--batch, -Z with multiple oids gives correct format' '
 	echo_without_newline_nul "$batch_input" >in &&
 	git cat-file --batch -Z <in >actual &&
 	test_cmp batch_output actual
-'
+    '
 
-batch_check_input="$hello_oid
-$tree_oid
-$commit_oid
-$tag_oid
+batch_check_input="$boid
+$loid
+$coid
+$toid
 deadbeef
 
 "
 
-printf "%s\0" \
-	"$hello_oid blob $hello_size" \
-	"$tree_oid tree $tree_size" \
-	"$commit_oid commit $commit_size" \
-	"$tag_oid tag $tag_size" \
+    printf "%s\0" \
+	"$boid blob $hello_size" \
+	"$loid tree $lsize" \
+	"$coid commit $csize" \
+	"$toid tag $tsize" \
 	"deadbeef missing" \
 	" missing" >batch_check_output
 
-test_expect_success "--batch-check with multiple oids gives correct format" '
+    test_expect_success "--batch-check with multiple oids gives correct format" '
 	tr "\0" "\n" <batch_check_output >expect &&
 	echo_without_newline "$batch_check_input" >in &&
 	git cat-file --batch-check <in >actual &&
 	test_cmp expect actual
-'
+    '
 
-test_expect_success "--batch-check, -z with multiple oids gives correct format" '
+    test_expect_success "--batch-check, -z with multiple oids gives correct format" '
 	tr "\0" "\n" <batch_check_output >expect &&
 	echo_without_newline_nul "$batch_check_input" >in &&
 	git cat-file --batch-check -z <in >actual &&
 	test_cmp expect actual
-'
+    '
 
-test_expect_success "--batch-check, -Z with multiple oids gives correct format" '
+    test_expect_success "--batch-check, -Z with multiple oids gives correct format" '
 	echo_without_newline_nul "$batch_check_input" >in &&
 	git cat-file --batch-check -Z <in >actual &&
 	test_cmp batch_check_output actual
-'
-
-test_expect_success FUNNYNAMES 'setup with newline in input' '
-	touch -- "newline${LF}embedded" &&
-	git add -- "newline${LF}embedded" &&
-	git commit -m "file with newline embedded" &&
-	test_tick &&
-
-	printf "HEAD:newline${LF}embedded" >in
-'
-
-test_expect_success FUNNYNAMES '--batch-check, -z with newline in input' '
-	git cat-file --batch-check -z <in >actual &&
-	echo "$(git rev-parse "HEAD:newline${LF}embedded") blob 0" >expect &&
-	test_cmp expect actual
-'
-
-test_expect_success FUNNYNAMES '--batch-check, -Z with newline in input' '
-	git cat-file --batch-check -Z <in >actual &&
-	printf "%s\0" "$(git rev-parse "HEAD:newline${LF}embedded") blob 0" >expect &&
-	test_cmp expect actual
-'
+    '
 
-batch_command_multiple_info="info $hello_oid
-info $tree_oid
-info $commit_oid
-info $tag_oid
+batch_command_multiple_info="info $boid
+info $loid
+info $coid
+info $toid
 info deadbeef"
 
-test_expect_success '--batch-command with multiple info calls gives correct format' '
+    test_expect_success '--batch-command with multiple info calls gives correct format' '
 	cat >expect <<-EOF &&
-	$hello_oid blob $hello_size
-	$tree_oid tree $tree_size
-	$commit_oid commit $commit_size
-	$tag_oid tag $tag_size
+	$boid blob $hello_size
+	$loid tree $lsize
+	$coid commit $csize
+	$toid tag $tsize
 	deadbeef missing
 	EOF
 
@@ -510,22 +540,22 @@ test_expect_success '--batch-command with multiple info calls gives correct form
 	git cat-file --batch-command --buffer -Z <in >actual &&
 
 	test_cmp expect_nul actual
-'
+    '
 
-batch_command_multiple_contents="contents $hello_oid
-contents $commit_oid
-contents $tag_oid
+batch_command_multiple_contents="contents $boid
+contents $coid
+contents $toid
 contents deadbeef
 flush"
 
-test_expect_success '--batch-command with multiple command calls gives correct format' '
+    test_expect_success '--batch-command with multiple command calls gives correct format' '
 	printf "%s\0" \
-		"$hello_oid blob $hello_size" \
+		"$boid blob $hello_size" \
 		"$hello_content" \
-		"$commit_oid commit $commit_size" \
-		"$commit_content" \
-		"$tag_oid tag $tag_size" \
-		"$tag_content" \
+		"$coid commit $csize" \
+		"$ccontent" \
+		"$toid tag $tsize" \
+		"$tcontent" \
 		"deadbeef missing" >expect_nul &&
 	tr "\0" "\n" <expect_nul >expect &&
 
@@ -543,6 +573,33 @@ test_expect_success '--batch-command with multiple command calls gives correct f
 	git cat-file --batch-command --buffer -Z <in >actual &&
 
 	test_cmp expect_nul actual
+    '
+
+}
+
+batch_tests $hello_oid $tree_oid $tree_size $commit_oid $commit_size "$commit_content" $tag_oid $tag_size "$tag_content"
+batch_tests $hello_compat_oid $tree_compat_oid $tree_compat_size $commit_compat_oid $commit_compat_size "$commit_compat_content" $tag_compat_oid $tag_compat_size "$tag_compat_content"
+
+
+test_expect_success FUNNYNAMES 'setup with newline in input' '
+	touch -- "newline${LF}embedded" &&
+	git add -- "newline${LF}embedded" &&
+	git commit -m "file with newline embedded" &&
+	test_tick &&
+
+	printf "HEAD:newline${LF}embedded" >in
+'
+
+test_expect_success FUNNYNAMES '--batch-check, -z with newline in input' '
+	git cat-file --batch-check -z <in >actual &&
+	echo "$(git rev-parse "HEAD:newline${LF}embedded") blob 0" >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success FUNNYNAMES '--batch-check, -Z with newline in input' '
+	git cat-file --batch-check -Z <in >actual &&
+	printf "%s\0" "$(git rev-parse "HEAD:newline${LF}embedded") blob 0" >expect &&
+	test_cmp expect actual
 '
 
 test_expect_success 'setup blobs which are likely to delta' '
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH 30/30] t1016-compatObjectFormat: Add tests to verify the conversion between objects
  2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
                   ` (28 preceding siblings ...)
  2023-09-27 19:55 ` [PATCH 29/30] t1006: Test oid compatibility with cat-file Eric W. Biederman
@ 2023-09-27 19:55 ` Eric W. Biederman
  2023-09-27 21:31 ` [PATCH 00/30] Initial support for multiple hash functions Junio C Hamano
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-09-27 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

For now my strategy is simple.  Create two identical repositories one
in each format.  Use fixed timestamps. Verify the dynamically computed
compatibility objects from one repository match the objects stored in
the other repository.

A general limitation of this strategy is that the git when generating
signed tags and commits with compatObjectFormat enabled will generate
a signature for both formats.  To overcome this limitation I have
added "test-tool delete-gpgsig" that when fed an signed commit or tag
with two signatures deletes one of the signatures.

With that in place I can have "git commit" and  "git tag" generate
signed objects, have my tool delete one, and feed the new object
into "git hash-object" to create the kinds of commits and tags
git without compatObjectFormat enabled will generate.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 Makefile                      |   1 +
 t/helper/test-delete-gpgsig.c |  62 ++++++++
 t/helper/test-tool.c          |   1 +
 t/helper/test-tool.h          |   1 +
 t/t1016-compatObjectFormat.sh | 280 ++++++++++++++++++++++++++++++++++
 t/t1016/gpg                   |   2 +
 6 files changed, 347 insertions(+)
 create mode 100644 t/helper/test-delete-gpgsig.c
 create mode 100755 t/t1016-compatObjectFormat.sh
 create mode 100755 t/t1016/gpg

diff --git a/Makefile b/Makefile
index 3c18664def9a..3e4444fb9ab2 100644
--- a/Makefile
+++ b/Makefile
@@ -790,6 +790,7 @@ TEST_BUILTINS_OBJS += test-crontab.o
 TEST_BUILTINS_OBJS += test-csprng.o
 TEST_BUILTINS_OBJS += test-ctype.o
 TEST_BUILTINS_OBJS += test-date.o
+TEST_BUILTINS_OBJS += test-delete-gpgsig.o
 TEST_BUILTINS_OBJS += test-delta.o
 TEST_BUILTINS_OBJS += test-dir-iterator.o
 TEST_BUILTINS_OBJS += test-drop-caches.o
diff --git a/t/helper/test-delete-gpgsig.c b/t/helper/test-delete-gpgsig.c
new file mode 100644
index 000000000000..e36831af03f6
--- /dev/null
+++ b/t/helper/test-delete-gpgsig.c
@@ -0,0 +1,62 @@
+#include "test-tool.h"
+#include "gpg-interface.h"
+#include "strbuf.h"
+
+
+int cmd__delete_gpgsig(int argc, const char **argv)
+{
+	struct strbuf buf = STRBUF_INIT;
+	const char *pattern = "gpgsig";
+	const char *bufptr, *tail, *eol;
+	int deleting = 0;
+	size_t plen;
+
+	if (argc >= 2) {
+		pattern = argv[1];
+		argv++;
+		argc--;
+	}
+
+	plen = strlen(pattern);
+	strbuf_read(&buf, 0, 0);
+
+	if (!strcmp(pattern, "trailer")) {
+		size_t payload_size = parse_signed_buffer(buf.buf, buf.len);
+		fwrite(buf.buf, 1, payload_size, stdout);
+		fflush(stdout);
+		return 0;
+	}
+
+	bufptr = buf.buf;
+	tail = bufptr + buf.len;
+
+	while (bufptr < tail) {
+		/* Find the end of the line */
+		eol = memchr(bufptr, '\n', tail - bufptr);
+		if (!eol)
+			eol = tail;
+
+		/* Drop continuation lines */
+		if (deleting && (bufptr < eol) && (bufptr[0] == ' ')) {
+			bufptr = eol + 1;
+			continue;
+		}
+		deleting = 0;
+
+		/* Does the line match the prefix? */
+		if (((bufptr + plen) < eol) &&
+		    !memcmp(bufptr, pattern, plen) &&
+		    (bufptr[plen] == ' ')) {
+			deleting = 1;
+			bufptr = eol + 1;
+			continue;
+		}
+
+		/* Print all other lines */
+		fwrite(bufptr, 1, (eol - bufptr) + 1, stdout);
+		bufptr = eol + 1;
+	}
+	fflush(stdout);
+
+	return 0;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index abe8a785eb65..8b6c84f202d6 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -21,6 +21,7 @@ static struct test_cmd cmds[] = {
 	{ "csprng", cmd__csprng },
 	{ "ctype", cmd__ctype },
 	{ "date", cmd__date },
+	{ "delete-gpgsig", cmd__delete_gpgsig },
 	{ "delta", cmd__delta },
 	{ "dir-iterator", cmd__dir_iterator },
 	{ "drop-caches", cmd__drop_caches },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index ea2672436c9a..76baaece35b9 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -15,6 +15,7 @@ int cmd__csprng(int argc, const char **argv);
 int cmd__ctype(int argc, const char **argv);
 int cmd__date(int argc, const char **argv);
 int cmd__delta(int argc, const char **argv);
+int cmd__delete_gpgsig(int argc, const char **argv);
 int cmd__dir_iterator(int argc, const char **argv);
 int cmd__drop_caches(int argc, const char **argv);
 int cmd__dump_cache_tree(int argc, const char **argv);
diff --git a/t/t1016-compatObjectFormat.sh b/t/t1016-compatObjectFormat.sh
new file mode 100755
index 000000000000..bb558a1d562a
--- /dev/null
+++ b/t/t1016-compatObjectFormat.sh
@@ -0,0 +1,280 @@
+#!/bin/sh
+#
+# Copyright (c) 2023 Eric Biederman
+#
+
+test_description='Test how well compatObjectFormat works'
+
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-gpg.sh
+
+# All of the follow variables must be defined in the environment:
+# GIT_AUTHOR_NAME
+# GIT_AUTHOR_EMAIL
+# GIT_AUTHOR_DATE
+# GIT_COMMITTER_NAME
+# GIT_COMMITTER_EMAIL
+# GIT_COMMITTER_DATE
+#
+# The test relies on these variables being set so that the two
+# different commits in two different repositories encoded with two
+# different hash functions result in the same content in the commits.
+# This means that when the commit is translated between hash functions
+# the commit is identical to the commit in the other repository.
+
+compat_hash () {
+    case "$1" in
+    "sha1")
+	echo "sha256"
+	;;
+    "sha256")
+	echo "sha1"
+	;;
+    esac
+}
+
+hello_oid () {
+    case "$1" in
+    "sha1")
+	echo "$hello_sha1_oid"
+	;;
+    "sha256")
+	echo "$hello_sha256_oid"
+	;;
+    esac
+}
+
+tree_oid () {
+    case "$1" in
+    "sha1")
+	echo "$tree_sha1_oid"
+	;;
+    "sha256")
+	echo "$tree_sha256_oid"
+	;;
+    esac
+}
+
+commit_oid () {
+    case "$1" in
+    "sha1")
+	echo "$commit_sha1_oid"
+	;;
+    "sha256")
+	echo "$commit_sha256_oid"
+	;;
+    esac
+}
+
+commit2_oid () {
+    case "$1" in
+    "sha1")
+	echo "$commit2_sha1_oid"
+	;;
+    "sha256")
+	echo "$commit2_sha256_oid"
+	;;
+    esac
+}
+
+del_sigcommit () {
+    local delete=$1
+
+    if test "$delete" = "sha256" ; then
+	local pattern="gpgsig-sha256"
+    else
+	local pattern="gpgsig"
+    fi
+    test-tool delete-gpgsig "$pattern"
+}
+
+
+del_sigtag () {
+    local storage=$1
+    local delete=$2
+
+    if test "$storage" = "$delete" ; then
+	local pattern="trailer"
+    elif test "$storage" = "sha256" ; then
+	local pattern="gpgsig"
+    else
+	local pattern="gpgsig-sha256"
+    fi
+    test-tool delete-gpgsig "$pattern"
+}
+
+base=$(pwd)
+for hash in sha1 sha256
+do
+	cd "$base"
+	mkdir -p repo-$hash
+	cd repo-$hash
+
+	test_expect_success "setup $hash repository" '
+		git init --object-format=$hash &&
+		git config core.repositoryformatversion 1 &&
+		git config extensions.objectformat $hash &&
+		git config extensions.compatobjectformat $(compat_hash $hash) &&
+		git config gpg.program $TEST_DIRECTORY/t1016/gpg &&
+		echo "Hellow World!" > hello &&
+		eval hello_${hash}_oid=$(git hash-object hello) &&
+		git update-index --add hello &&
+		git commit -m "Initial commit" &&
+		eval commit_${hash}_oid=$(git rev-parse HEAD) &&
+		eval tree_${hash}_oid=$(git rev-parse HEAD^{tree})
+	'
+	test_expect_success "create a $hash  tagged blob" '
+		git tag --no-sign -m "This is a tag" hellotag $(hello_oid $hash) &&
+		eval hellotag_${hash}_oid=$(git rev-parse hellotag)
+	'
+	test_expect_success "create a $hash tagged tree" '
+		git tag --no-sign -m "This is a tag" treetag $(tree_oid $hash) &&
+		eval treetag_${hash}_oid=$(git rev-parse treetag)
+	'
+	test_expect_success "create a $hash tagged commit" '
+		git tag --no-sign -m "This is a tag" committag $(commit_oid $hash) &&
+		eval committag_${hash}_oid=$(git rev-parse committag)
+	'
+	test_expect_success GPG2 "create a $hash signed commit" '
+		git commit --gpg-sign --allow-empty -m "This is a signed commit" &&
+		eval signedcommit_${hash}_oid=$(git rev-parse HEAD)
+	'
+	test_expect_success GPG2 "create a $hash signed tag" '
+		git tag -s -m "This is a signed tag" signedtag HEAD &&
+		eval signedtag_${hash}_oid=$(git rev-parse signedtag)
+	'
+	test_expect_success "create a $hash branch" '
+		git checkout -b branch $(commit_oid $hash) &&
+		echo "More more more give me more!" > more &&
+		eval more_${hash}_oid=$(git hash-object more) &&
+		echo "Another and another and another" > another &&
+		eval another_${hash}_oid=$(git hash-object another) &&
+		git update-index --add more another &&
+		git commit -m "Add more files!" &&
+		eval commit2_${hash}_oid=$(git rev-parse HEAD) &&
+		eval tree2_${hash}_oid=$(git rev-parse HEAD^{tree})
+	'
+	test_expect_success GPG2 "create another $hash signed tag" '
+		git tag -s -m "This is another signed tag" signedtag2 $(commit2_oid $hash) &&
+		eval signedtag2_${hash}_oid=$(git rev-parse signedtag2)
+	'
+	test_expect_success GPG2 "merge the $hash branches together" '
+		git merge -S -m "merge some signed tags together" signedtag signedtag2 &&
+		eval signedcommit2_${hash}_oid=$(git rev-parse HEAD)
+	'
+	test_expect_success GPG2 "create additional $hash signed commits" '
+		git commit --gpg-sign --allow-empty -m "This is an additional signed commit" &&
+		git cat-file commit HEAD | del_sigcommit sha256 > "../${hash}_signedcommit3" &&
+		git cat-file commit HEAD | del_sigcommit sha1 > "../${hash}_signedcommit4" &&
+		eval signedcommit3_${hash}_oid=$(git hash-object -t commit -w ../${hash}_signedcommit3) &&
+		eval signedcommit4_${hash}_oid=$(git hash-object -t commit -w ../${hash}_signedcommit4)
+	'
+	test_expect_success GPG2 "create additional $hash signed tags" '
+		git tag -s -m "This is an additional signed tag" signedtag34 HEAD &&
+		git cat-file tag signedtag34 | del_sigtag "${hash}" sha256 > ../${hash}_signedtag3 &&
+		git cat-file tag signedtag34 | del_sigtag "${hash}" sha1 > ../${hash}_signedtag4 &&
+		eval signedtag3_${hash}_oid=$(git hash-object -t tag -w ../${hash}_signedtag3) &&
+		eval signedtag4_${hash}_oid=$(git hash-object -t tag -w ../${hash}_signedtag4)
+	'
+done
+cd "$base"
+
+compare_oids () {
+    test "$#" = 5 && { local PREREQ=$1; shift; } || PREREQ=
+    local type="$1"
+    local name="$2"
+    local sha1_oid="$3"
+    local sha256_oid="$4"
+
+    echo ${sha1_oid} > ${name}_sha1_expected
+    echo ${sha256_oid} > ${name}_sha256_expected
+    echo ${type} > ${name}_type_expected
+
+    git --git-dir=repo-sha1/.git rev-parse --output-object-format=sha256 ${sha1_oid} > ${name}_sha1_sha256_found
+    git --git-dir=repo-sha256/.git rev-parse --output-object-format=sha1 ${sha256_oid} > ${name}_sha256_sha1_found
+    local sha1_sha256_oid=$(cat ${name}_sha1_sha256_found)
+    local sha256_sha1_oid=$(cat ${name}_sha256_sha1_found)
+
+    test_expect_success $PREREQ "Verify ${type} ${name}'s sha1 oid" '
+	git --git-dir=repo-sha256/.git rev-parse --output-object-format=sha1 ${sha256_oid} > ${name}_sha1 &&
+	test_cmp ${name}_sha1 ${name}_sha1_expected
+'
+
+    test_expect_success $PREREQ "Verify ${type} ${name}'s sha256 oid" '
+	git --git-dir=repo-sha1/.git rev-parse --output-object-format=sha256 ${sha1_oid} > ${name}_sha256 &&
+	test_cmp ${name}_sha256 ${name}_sha256_expected
+'
+
+    test_expect_success $PREREQ "Verify ${name}'s sha1 type" '
+	git --git-dir=repo-sha1/.git cat-file -t ${sha1_oid} > ${name}_type1 &&
+	git --git-dir=repo-sha256/.git cat-file -t ${sha256_sha1_oid} > ${name}_type2 &&
+	test_cmp ${name}_type1 ${name}_type2 &&
+	test_cmp ${name}_type1 ${name}_type_expected
+'
+
+    test_expect_success $PREREQ "Verify ${name}'s sha256 type" '
+	git --git-dir=repo-sha256/.git cat-file -t ${sha256_oid} > ${name}_type3 &&
+	git --git-dir=repo-sha1/.git cat-file -t ${sha1_sha256_oid} > ${name}_type4 &&
+	test_cmp ${name}_type3 ${name}_type4 &&
+	test_cmp ${name}_type3 ${name}_type_expected
+'
+
+    test_expect_success $PREREQ "Verify ${name}'s sha1 size" '
+	git --git-dir=repo-sha1/.git cat-file -s ${sha1_oid} > ${name}_size1 &&
+	git --git-dir=repo-sha256/.git cat-file -s ${sha256_sha1_oid} > ${name}_size2 &&
+	test_cmp ${name}_size1 ${name}_size2
+'
+
+    test_expect_success $PREREQ "Verify ${name}'s sha256 size" '
+	git --git-dir=repo-sha256/.git cat-file -s ${sha256_oid} > ${name}_size3 &&
+	git --git-dir=repo-sha1/.git cat-file -s ${sha1_sha256_oid} > ${name}_size4 &&
+	test_cmp ${name}_size3 ${name}_size4
+'
+
+    test_expect_success $PREREQ "Verify ${name}'s sha1 pretty content" '
+	git --git-dir=repo-sha1/.git cat-file -p ${sha1_oid} > ${name}_content1 &&
+	git --git-dir=repo-sha256/.git cat-file -p ${sha256_sha1_oid} > ${name}_content2 &&
+	test_cmp ${name}_content1 ${name}_content2
+'
+
+    test_expect_success $PREREQ "Verify ${name}'s sha256 pretty content" '
+	git --git-dir=repo-sha256/.git cat-file -p ${sha256_oid} > ${name}_content3 &&
+	git --git-dir=repo-sha1/.git cat-file -p ${sha1_sha256_oid} > ${name}_content4 &&
+	test_cmp ${name}_content3 ${name}_content4
+'
+
+    test_expect_success $PREREQ "Verify ${name}'s sha1 content" '
+	git --git-dir=repo-sha1/.git cat-file ${type} ${sha1_oid} > ${name}_content5 &&
+	git --git-dir=repo-sha256/.git cat-file ${type} ${sha256_sha1_oid} > ${name}_content6 &&
+	test_cmp ${name}_content5 ${name}_content6
+'
+
+    test_expect_success $PREREQ "Verify ${name}'s sha256 content" '
+	git --git-dir=repo-sha256/.git cat-file ${type} ${sha256_oid} > ${name}_content7 &&
+	git --git-dir=repo-sha1/.git cat-file ${type} ${sha1_sha256_oid} > ${name}_content8 &&
+	test_cmp ${name}_content7 ${name}_content8
+'
+
+}
+
+compare_oids 'blob' hello "$hello_sha1_oid" "$hello_sha256_oid"
+compare_oids 'tree' tree "$tree_sha1_oid" "$tree_sha256_oid"
+compare_oids 'commit' commit "$commit_sha1_oid" "$commit_sha256_oid"
+compare_oids GPG2 'commit' signedcommit "$signedcommit_sha1_oid" "$signedcommit_sha256_oid"
+compare_oids 'tag' hellotag "$hellotag_sha1_oid" "$hellotag_sha256_oid"
+compare_oids 'tag' treetag "$treetag_sha1_oid" "$treetag_sha256_oid"
+compare_oids 'tag' committag "$committag_sha1_oid" "$committag_sha256_oid"
+compare_oids GPG2 'tag' signedtag "$signedtag_sha1_oid" "$signedtag_sha256_oid"
+
+compare_oids 'blob' more "$more_sha1_oid" "$more_sha256_oid"
+compare_oids 'blob' another "$another_sha1_oid" "$another_sha256_oid"
+compare_oids 'tree' tree2 "$tree2_sha1_oid" "$tree2_sha256_oid"
+compare_oids 'commit' commit2 "$commit2_sha1_oid" "$commit2_sha256_oid"
+compare_oids GPG2 'tag' signedtag2 "$signedtag2_sha1_oid" "$signedtag2_sha256_oid"
+compare_oids GPG2 'commit' signedcommit2 "$signedcommit2_sha1_oid" "$signedcommit2_sha256_oid"
+compare_oids GPG2 'commit' signedcommit3 "$signedcommit3_sha1_oid" "$signedcommit3_sha256_oid"
+compare_oids GPG2 'commit' signedcommit4 "$signedcommit4_sha1_oid" "$signedcommit4_sha256_oid"
+compare_oids GPG2 'tag' signedtag3 "$signedtag3_sha1_oid" "$signedtag3_sha256_oid"
+compare_oids GPG2 'tag' signedtag4 "$signedtag4_sha1_oid" "$signedtag4_sha256_oid"
+
+test_done
diff --git a/t/t1016/gpg b/t/t1016/gpg
new file mode 100755
index 000000000000..2601cb18a5b3
--- /dev/null
+++ b/t/t1016/gpg
@@ -0,0 +1,2 @@
+#!/bin/sh
+exec gpg --faked-system-time "20230918T154812" "$@"
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH 01/30] object-file-convert: Stubs for converting from one object format to another
  2023-09-27 19:55 ` [PATCH 01/30] object-file-convert: Stubs for converting from one object format to another Eric W. Biederman
@ 2023-09-27 20:42   ` Eric Sunshine
  2023-10-02  1:22     ` Eric W. Biederman
  0 siblings, 1 reply; 104+ messages in thread
From: Eric Sunshine @ 2023-09-27 20:42 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Junio C Hamano, git, brian m. carlson, Eric W. Biederman

On Wed, Sep 27, 2023 at 3:55 PM Eric W. Biederman <ebiederm@gmail.com> wrote:
> Two basic functions are provided:
> - convert_object_file Takes an object file it's type and hash algorithm
>   and converts it into the equivalent object file that would
>   have been generated with hash algorithm "to".
>
>   For blob objects there is no converstion to be done and it is an
>   error to use this function on them.

s/converstion/conversion/

>   For commit, tree, and tag objects embedded oids are replaced by the
>   oids of the objects they refer to with those objects and their
>   object ids reencoded in with the hash algorithm "to".  Signatures
>   are rearranged so that they remain valid after the object has
>   been reencoded.
>
> - repo_oid_to_algop which takes an oid that refers to an object file
>   and returns the oid of the equavalent object file generated
>   with the target hash algorithm.

s/equavalent/equivalent/

> The pair of files object-file-convert.c and object-file-convert.h are
> introduced to hold as much of this logic as possible to keep this
> conversion logic cleanly separated from everything else and in the
> hopes that someday the code will be clean enough git can support
> compiling out support for sha1 and the various conversion functions.
>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Just some minor comments below, many of which are subjective
style-related observations, thus not necessarily actionable, but also
one or two legitimate questions.

> diff --git a/object-file-convert.c b/object-file-convert.c
> @@ -0,0 +1,57 @@
> +int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
> +                     const struct git_hash_algo *to, struct object_id *dest)
> +{
> +       /*
> +        * If the source alogirthm is not set, then we're using the
> +        * default hash algorithm for that object.
> +        */

s/alogirthm/algorithm/

> +       const struct git_hash_algo *from =
> +               src->algo ? &hash_algos[src->algo] : repo->hash_algo;
> +
> +       if (from == to) {
> +               if (src != dest)
> +                       oidcpy(dest, src);
> +               return 0;
> +       }
> +       return -1;
> +}

On this project, we usually get the simple cases out of the way first,
which often reduces the indentation level, making the code easier to
digest at a glance. So, it would be typical to write this as:

    if (from != to)
        return -1
    if (src != dest)
        oidcpy(dest, src);
    return 0;

or even:

    if (from != to)
        return -1
    if (src == dest)
        return 0;
    oidcpy(dest, src);
    return 0;

This way, for instance, the reader doesn't get to the end of the
function and then have to scan backward to understand the condition of
the `return -1`.

> +int convert_object_file(struct strbuf *outbuf,
> +                       const struct git_hash_algo *from,
> +                       const struct git_hash_algo *to,
> +                       const void *buf, size_t len,
> +                       enum object_type type,
> +                       int gentle)
> +{
> +       int ret;
> +
> +       /* Don't call this function when no conversion is necessary */
> +       if ((from == to) || (type == OBJ_BLOB))
> +               die("Refusing noop object file conversion");

Several comments...

Style: we usually reduce the noise level by dropping the extra parentheses:

    if (from == to || type == OBJ_BLOB)

Does this condition represent a programming error or a runtime error
triggerable by some input? If a programming error, then use BUG()
rather than die().

If a triggerable runtime error, then...

* start user-facing messages with lowercase rather than capitalized word

* make the user-facing message localizable so readers of other
languages can digest it

    die(_("refusing do-nothing object conversion"));

On the other hand, don't make BUG() messages localizable.

> +       switch (type) {
> +       case OBJ_COMMIT:
> +       case OBJ_TREE:
> +       case OBJ_TAG:
> +       default:
> +               /* Not implemented yet, so fail. */
> +               ret = -1;
> +               break;
> +       }
> +       if (!ret)
> +               return 0;
> +       if (gentle) {
> +               strbuf_release(outbuf);
> +               return ret;
> +       }

This function appears to be a mere skeleton at the moment, so it's
difficult to judge at this point whether you are using `outbuf` as a
bag of bytes or as a legitimate string container. If the latter, then
the API may be reasonable, but if you're using it as a bag-of-bytes,
then it feels like you're leaking an implementation detail into the
API.

> +       die(_("Failed to convert object from %s to %s"),
> +               from->name, to->name);

s/Failed/failed/

For people trying to diagnose this problem, would it be helpful to
present more information about the failed conversion, such as object
type and perhaps even its OID?

> diff --git a/object-file-convert.h b/object-file-convert.h
> @@ -0,0 +1,24 @@
> +int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
> +                     const struct git_hash_algo *to, struct object_id *dest);

I suppose the function name is pretty much self-explanatory to those
familiar with the underlying concepts, but it might still be helpful
to add a comment explaining what the function does.

> +/*
> + * Convert an object file from one hash algorithm to another algorithm.
> + * Return -1 on failure, 0 on success.
> + */
> +int convert_object_file(struct strbuf *outbuf,
> +                       const struct git_hash_algo *from,
> +                       const struct git_hash_algo *to,
> +                       const void *buf, size_t len,
> +                       enum object_type type,
> +                       int gentle);

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 00/30] Initial support for multiple hash functions
  2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
                   ` (29 preceding siblings ...)
  2023-09-27 19:55 ` [PATCH 30/30] t1016-compatObjectFormat: Add tests to verify the conversion between objects Eric W. Biederman
@ 2023-09-27 21:31 ` Junio C Hamano
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
  31 siblings, 0 replies; 104+ messages in thread
From: Junio C Hamano @ 2023-09-27 21:31 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: git, brian m. carlson

"Eric W. Biederman" <ebiederm@gmail.com> writes:

> I have been going over and over this patchset trying to figure
> out if it is ready to be merged.  I don't know of any deficiencies
> so it is at a point it could benefit from a set of eyes that
> are not mine.
>
> I had planned to wait a little bit longer but there are some on-going
> conversations that could benefit from people seeing what it means for a
> repository to support two hash functions at the same time.

Thanks.  On top of what commit are these patches expected to be applied?

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 21/30] repository: Implement extensions.compatObjectFormat
  2023-09-27 19:55 ` [PATCH 21/30] repository: Implement extensions.compatObjectFormat Eric W. Biederman
@ 2023-09-27 21:39   ` Junio C Hamano
  2023-09-28 20:18     ` Junio C Hamano
  2023-10-02  1:31     ` Eric W. Biederman
  0 siblings, 2 replies; 104+ messages in thread
From: Junio C Hamano @ 2023-09-27 21:39 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: git, brian m. carlson, Eric W. Biederman

"Eric W. Biederman" <ebiederm@gmail.com> writes:

> diff --git a/setup.c b/setup.c
> index deb5a33fe9e1..87b40472dbc5 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -598,6 +598,25 @@ static enum extension_result handle_extension(const char *var,
>  		}

This line in the pre-context needed fuzzing, but otherwise the
series applied cleanly on top of v2.42.0.

> Subject: Re: [PATCH 21/30] repository: Implement extensions.compatObjectFormat

"Implement" -> "implement" (many other patches share the same
problem, none of which I fixed while queueing).


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 02/30] oid-array: Teach oid-array to handle multiple kinds of oids
  2023-09-27 19:55 ` [PATCH 02/30] oid-array: Teach oid-array to handle multiple kinds of oids Eric W. Biederman
@ 2023-09-27 23:20   ` Eric Sunshine
  0 siblings, 0 replies; 104+ messages in thread
From: Eric Sunshine @ 2023-09-27 23:20 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Junio C Hamano, git, brian m. carlson, Eric W. Biederman

On Wed, Sep 27, 2023 at 3:55 PM Eric W. Biederman <ebiederm@gmail.com> wrote:
> While looking at how to handle input of both SHA-1 and SHA-256 oids in
> get_oid_with_context, I realized that the oid_array in
> repo_for_each_abbrev might have more than one kind of oid stored in it
> simulataneously.

s/simulataneously/simultaneously/

> Update to oid_array_append to ensure that oids added to an oid array
> always have an algorithm set.
>
> Update void_hashcmp to first verify two oids use the same hash algorithm
> before comparing them to each other.
>
> With that oid-array should be safe to use with differnt kinds of
> oids simultaneously.

s/differnt/different/

> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 03/30] object-names: Support input of oids in any supported hash
  2023-09-27 19:55 ` [PATCH 03/30] object-names: Support input of oids in any supported hash Eric W. Biederman
@ 2023-09-27 23:29   ` Eric Sunshine
  2023-10-02  1:54     ` Eric W. Biederman
  0 siblings, 1 reply; 104+ messages in thread
From: Eric Sunshine @ 2023-09-27 23:29 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Junio C Hamano, git, brian m. carlson, Eric W. Biederman

On Wed, Sep 27, 2023 at 3:56 PM Eric W. Biederman <ebiederm@gmail.com> wrote:
> Support short oids encoded in any algorithm, while ensuring enough of
> the oid is specified to disambiguate between all of the oids in the
> repository encoded in any algorithm.
>
> By default have the code continue to only accept oids specified in the
> storage hash algorithm of the repository, but when something is
> ambiguous display all of the possible oids from any oid encoding.
>
> A new flag is added GET_OID_HASH_ANY that when supplied causes the
> code to accept oids specified in any hash algorithm, and to return the
> oids that were resolved.
>
> This implements the functionality that allows both SHA-1 and SHA-256
> object names, from the "Object names on the command line" section of
> the hash function transition document.
>
> Care is taken in get_short_oid so that when the result is ambiguous
> the output remains the same of GIT_OID_HASH_ANY was not supplied.

s/of/as if/

> If GET_OID_HASH_ANY was supplied objects of any hash algorithm
> that match the prefix are displayed.
>
> This required updating repo_for_each_abbrev to give it a parameter
> so that it knows to look at all hash algorithms.
>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
> diff --git a/object-name.c b/object-name.c
> @@ -49,6 +50,7 @@ struct disambiguate_state {
>  static void update_candidates(struct disambiguate_state *ds, const struct object_id *current)
>  {
> +       /* The hash algorithm of the current has already been filtered */

Is there a word missing after "current"?

> @@ -503,8 +516,13 @@ static int sort_ambiguous(const void *a, const void *b, void *ctx)
> -       if (a_type == b_type)
> -               return oidcmp(a, b);
> +       if (a_type == b_type) {
> +               /* Is the hash algorithm the same? */
> +               if (a->algo == b->algo)
> +                       return oidcmp(a, b);
> +               else
> +                       return a->algo > b->algo ? 1 : -1;
> +       }

Nit: unnecessary comment ("Is the hash algorithm...") is merely
repeating what the code itself already says clearly enough

> @@ -553,6 +575,7 @@ static enum get_oid_result get_short_oid(struct repository *r,
>         else
>                 ds.fn = default_disambiguate_hint;
>
> +
>         find_short_object_filename(&ds);

Nit: unnecessary new blank line

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 05/30] loose: add a mapping between SHA-1 and SHA-256 for loose objects
  2023-09-27 19:55 ` [PATCH 05/30] loose: add a mapping between SHA-1 and SHA-256 for loose objects Eric W. Biederman
@ 2023-09-28  7:14   ` Eric Sunshine
  2023-10-02  2:11     ` Eric W. Biederman
  0 siblings, 1 reply; 104+ messages in thread
From: Eric Sunshine @ 2023-09-28  7:14 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Junio C Hamano, git, brian m. carlson, Eric W . Biederman

On Wed, Sep 27, 2023 at 3:56 PM Eric W. Biederman <ebiederm@gmail.com> wrote:
> As part of the transition plan, we'd like to add a file in the .git
> directory that maps loose objects between SHA-1 and SHA-256.  Let's
> implement the specification in the transition plan and store this data
> on a per-repository basis in struct repository.
>
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
> diff --git a/loose.c b/loose.c
> @@ -0,0 +1,245 @@
> +static int load_one_loose_object_map(struct repository *repo, struct object_directory *dir)
> +{
> +       struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT;
> +       FILE *fp;
> +
> +       if (!dir->loose_map)
> +               loose_object_map_init(&dir->loose_map);
> +
> +       insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->empty_tree, repo->compat_hash_algo->empty_tree);
> +       insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->empty_tree, repo->hash_algo->empty_tree);
> +
> +       insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->empty_blob, repo->compat_hash_algo->empty_blob);
> +       insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->empty_blob, repo->hash_algo->empty_blob);
> +
> +       insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->null_oid, repo->compat_hash_algo->null_oid);
> +       insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->null_oid, repo->hash_algo->null_oid);
> +
> +       strbuf_git_common_path(&path, repo, "objects/loose-object-idx");
> +       fp = fopen(path.buf, "rb");
> +       if (!fp)
> +               return 0;

This early return leaks `path`. At minimum, call
`strbuf_release(&path)` before returning.

> +       errno = 0;
> +       if (strbuf_getwholeline(&buf, fp, '\n') || strcmp(buf.buf, loose_object_header))
> +               goto err;
> +       while (!strbuf_getline_lf(&buf, fp)) {
> +               const char *p;
> +               struct object_id oid, compat_oid;
> +               if (parse_oid_hex_algop(buf.buf, &oid, &p, repo->hash_algo) ||
> +                   *p++ != ' ' ||
> +                   parse_oid_hex_algop(p, &compat_oid, &p, repo->compat_hash_algo) ||
> +                   p != buf.buf + buf.len)
> +                       goto err;
> +               insert_oid_pair(dir->loose_map->to_compat, &oid, &compat_oid);
> +               insert_oid_pair(dir->loose_map->to_storage, &compat_oid, &oid);
> +       }
> +
> +       strbuf_release(&buf);
> +       strbuf_release(&path);
> +       return errno ? -1 : 0;
> +err:
> +       strbuf_release(&buf);
> +       strbuf_release(&path);
> +       return -1;
> +}
> +
> +int repo_write_loose_object_map(struct repository *repo)
> +{
> +       kh_oid_map_t *map = repo->objects->odb->loose_map->to_compat;
> +       struct lock_file lock;
> +       int fd;
> +       khiter_t iter;
> +       struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT;
> +
> +       if (!should_use_loose_object_map(repo))
> +               return 0;
> +
> +       strbuf_git_common_path(&path, repo, "objects/loose-object-idx");
> +       fd = hold_lock_file_for_update_timeout(&lock, path.buf, LOCK_DIE_ON_ERROR, -1);
> +       iter = kh_begin(map);
> +       if (write_in_full(fd, loose_object_header, strlen(loose_object_header)) < 0)
> +               goto errout;
> +
> +       for (; iter != kh_end(map); iter++) {
> +               if (kh_exist(map, iter)) {
> +                       if (oideq(&kh_key(map, iter), the_hash_algo->empty_tree) ||
> +                           oideq(&kh_key(map, iter), the_hash_algo->empty_blob))
> +                               continue;
> +                       strbuf_addf(&buf, "%s %s\n", oid_to_hex(&kh_key(map, iter)), oid_to_hex(kh_value(map, iter)));
> +                       if (write_in_full(fd, buf.buf, buf.len) < 0)
> +                               goto errout;
> +                       strbuf_reset(&buf);
> +               }
> +       }

Nit: If you call strbuf_reset() immediately before strbuf_addf(), then
you save the reader of the code the effort of having to scan backward
through the function to verify that `buf` is empty the first time
through the loop.

> +       strbuf_release(&buf);
> +       if (commit_lock_file(&lock) < 0) {
> +               error_errno(_("could not write loose object index %s"), path.buf);
> +               strbuf_release(&path);
> +               return -1;
> +       }
> +       strbuf_release(&path);
> +       return 0;
> +errout:
> +       rollback_lock_file(&lock);
> +       strbuf_release(&buf);
> +       error_errno(_("failed to write loose object index %s\n"), path.buf);
> +       strbuf_release(&path);
> +       return -1;
> +}
> +
> +int repo_loose_object_map_oid(struct repository *repo,
> +                             const struct object_id *src,
> +                             const struct git_hash_algo *to,
> +                             struct object_id *dest)
> +
> +{

Style: unnecessary blank line before opening `{`

> +       struct object_directory *dir;
> +       kh_oid_map_t *map;
> +       khiter_t pos;
> +
> +       for (dir = repo->objects->odb; dir; dir = dir->next) {
> +               struct loose_object_map *loose_map = dir->loose_map;
> +               if (!loose_map)
> +                       continue;
> +               map = (to == repo->compat_hash_algo) ?
> +                       loose_map->to_compat :
> +                       loose_map->to_storage;
> +               pos = kh_get_oid_map(map, *src);
> +               if (pos < kh_end(map)) {
> +                       oidcpy(dest, kh_value(map, pos));
> +                       return 0;
> +               }
> +       }
> +       return -1;
> +}

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 21/30] repository: Implement extensions.compatObjectFormat
  2023-09-27 21:39   ` Junio C Hamano
@ 2023-09-28 20:18     ` Junio C Hamano
  2023-09-29  0:50       ` Eric Biederman
  2023-09-29 16:59       ` Eric W. Biederman
  2023-10-02  1:31     ` Eric W. Biederman
  1 sibling, 2 replies; 104+ messages in thread
From: Junio C Hamano @ 2023-09-28 20:18 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: git, brian m. carlson, Eric W. Biederman

Junio C Hamano <gitster@pobox.com> writes:

> "Eric W. Biederman" <ebiederm@gmail.com> writes:
>
>> diff --git a/setup.c b/setup.c
>> index deb5a33fe9e1..87b40472dbc5 100644
>> --- a/setup.c
>> +++ b/setup.c
>> @@ -598,6 +598,25 @@ static enum extension_result handle_extension(const char *var,
>>  		}
>
> This line in the pre-context needed fuzzing, but otherwise the
> series applied cleanly on top of v2.42.0.
>
>> Subject: Re: [PATCH 21/30] repository: Implement extensions.compatObjectFormat
>
> "Implement" -> "implement" (many other patches share the same
> problem, none of which I fixed while queueing).


The topic when merged near the tip of 'seen' seems to break a few CI
jobs here and there.  The log from the broken run can be seen at

    https://github.com/git/git/actions/runs/6331978214

You may have to log-in there before you can view the details.

Thanks.


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 21/30] repository: Implement extensions.compatObjectFormat
  2023-09-28 20:18     ` Junio C Hamano
@ 2023-09-29  0:50       ` Eric Biederman
  2023-09-29 16:59       ` Eric W. Biederman
  1 sibling, 0 replies; 104+ messages in thread
From: Eric Biederman @ 2023-09-29  0:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W. Biederman



On September 28, 2023 3:18:46 PM CDT, Junio C Hamano <gitster@pobox.com> wrote:
>Junio C Hamano <gitster@pobox.com> writes:
>
>> "Eric W. Biederman" <ebiederm@gmail.com> writes:
>>
>>> diff --git a/setup.c b/setup.c
>>> index deb5a33fe9e1..87b40472dbc5 100644
>>> --- a/setup.c
>>> +++ b/setup.c
>>> @@ -598,6 +598,25 @@ static enum extension_result handle_extension(const char *var,
>>>  		}
>>
>> This line in the pre-context needed fuzzing, but otherwise the
>> series applied cleanly on top of v2.42.0.
>>
>>> Subject: Re: [PATCH 21/30] repository: Implement extensions.compatObjectFormat
>>
>> "Implement" -> "implement" (many other patches share the same
>> problem, none of which I fixed while queueing).
>
>
>The topic when merged near the tip of 'seen' seems to break a few CI
>jobs here and there.  The log from the broken run can be seen at
>
>    https://github.com/git/git/actions/runs/6331978214
>
>You may have to log-in there before you can view the details.

Thanks.

It might take me a couple of days before I can dig into this, but I will dig in and see if I can understand and fix the build failures.

With any luck it will be something simple like forgetting that {} != {0}.

Eric 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 21/30] repository: Implement extensions.compatObjectFormat
  2023-09-28 20:18     ` Junio C Hamano
  2023-09-29  0:50       ` Eric Biederman
@ 2023-09-29 16:59       ` Eric W. Biederman
  2023-09-29 18:48         ` Junio C Hamano
  1 sibling, 1 reply; 104+ messages in thread
From: Eric W. Biederman @ 2023-09-29 16:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W. Biederman

Junio C Hamano <gitster@pobox.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> "Eric W. Biederman" <ebiederm@gmail.com> writes:
>>
>>> diff --git a/setup.c b/setup.c
>>> index deb5a33fe9e1..87b40472dbc5 100644
>>> --- a/setup.c
>>> +++ b/setup.c
>>> @@ -598,6 +598,25 @@ static enum extension_result handle_extension(const char *var,
>>>  		}
>>
>> This line in the pre-context needed fuzzing, but otherwise the
>> series applied cleanly on top of v2.42.0.
>>
>>> Subject: Re: [PATCH 21/30] repository: Implement extensions.compatObjectFormat
>>
>> "Implement" -> "implement" (many other patches share the same
>> problem, none of which I fixed while queueing).
>
>
> The topic when merged near the tip of 'seen' seems to break a few CI
> jobs here and there.  The log from the broken run can be seen at
>
>     https://github.com/git/git/actions/runs/6331978214
>
> You may have to log-in there before you can view the details.

Did you have any manual merge conflicts you had to resolve?
If so it is possible to see the merge result you had?

There is a static failure in commit.c of oidcpy because it thinks the
array is zero size.  That is weird, but once I get a test environment
setup I expect I can figure out what it is talking about.

There in linux-leaks it lists a bunch of test failures, and unless I see
what code is actually failing I am not certain I can figure it out.

Thanks,
Eric


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 21/30] repository: Implement extensions.compatObjectFormat
  2023-09-29 16:59       ` Eric W. Biederman
@ 2023-09-29 18:48         ` Junio C Hamano
  2023-10-02  0:48           ` Eric W. Biederman
  0 siblings, 1 reply; 104+ messages in thread
From: Junio C Hamano @ 2023-09-29 18:48 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: git, brian m. carlson, Eric W. Biederman

"Eric W. Biederman" <ebiederm@gmail.com> writes:

> Did you have any manual merge conflicts you had to resolve?
> If so it is possible to see the merge result you had?

The only merge-fix I had to apply to make everything compile was
this:

diff --git a/bloom.c b/bloom.c
index ff131893cd..59eb0a0481 100644
--- a/bloom.c
+++ b/bloom.c
@@ -278,7 +278,7 @@ static int has_entries_with_high_bit(struct repository *r, struct tree *t)
 		struct tree_desc desc;
 		struct name_entry entry;
 
-		init_tree_desc(&desc, t->buffer, t->size);
+		init_tree_desc(&desc, &t->object.oid, t->buffer, t->size);
 		while (tree_entry(&desc, &entry)) {
 			size_t i;
 			for (i = 0; i < entry.pathlen; i++) {

as one topic changed the function signature while the other topic
added a new callsite.

Everything else was pretty-much auto resolved, I think.

Output from "git show --cc seen" matches my recollection.  The above
does appear as an evil merge.


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH 21/30] repository: Implement extensions.compatObjectFormat
  2023-09-29 18:48         ` Junio C Hamano
@ 2023-10-02  0:48           ` Eric W. Biederman
  0 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  0:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W. Biederman

Junio C Hamano <gitster@pobox.com> writes:

> "Eric W. Biederman" <ebiederm@gmail.com> writes:
>
>> Did you have any manual merge conflicts you had to resolve?
>> If so it is possible to see the merge result you had?
>
> The only merge-fix I had to apply to make everything compile was
> this:
>
> diff --git a/bloom.c b/bloom.c
> index ff131893cd..59eb0a0481 100644
> --- a/bloom.c
> +++ b/bloom.c
> @@ -278,7 +278,7 @@ static int has_entries_with_high_bit(struct repository *r, struct tree *t)
>  		struct tree_desc desc;
>  		struct name_entry entry;
>  
> -		init_tree_desc(&desc, t->buffer, t->size);
> +		init_tree_desc(&desc, &t->object.oid, t->buffer, t->size);
>  		while (tree_entry(&desc, &entry)) {
>  			size_t i;
>  			for (i = 0; i < entry.pathlen; i++) {
>
> as one topic changed the function signature while the other topic
> added a new callsite.
>
> Everything else was pretty-much auto resolved, I think.
>
> Output from "git show --cc seen" matches my recollection.  The above
> does appear as an evil merge.

Thanks, and I found all of this on your seen branch.

After tracking all of these down it appears all of the errors
came from my branch, I will be resending the patches as soon
as I finish going through the review comments.

Looking at the build errors pretty much all of the all of the
automatic test failures came from commit_tree_extended.

There was a strbuf that did
strbuf_init(&buf, 8192);
strbuf_init(&buf, 8192);
twice.

Plus there was another buffer that was allocated and not freed,
in commit_tree_extended.

The leaks were a bit tricky to track down as building with SANITIZE=leak
causes tests to fail somewhat randomly for me with "gcc (Debian
12.2.0-14) 12.2.0".

There was one smatch static-analysis error that suggested using
CALLOC_ARRAY instead of xcalloc.

The "win" build and "linux-gcc-default (ubuntu-lastest)" build failed
because of an over eager gcc warning -Werror=array-bounds.
Claiming:

 In file included from /usr/include/string.h:535,
                  from git-compat-util.h:228,
                  from commit.c:1:
 In function ‘memcpy’,
     inlined from ‘oidcpy’ at hash-ll.h:272:2,
     inlined from ‘commit_tree_extended’ at commit.c:1705:3:
 ##[error]/usr/include/x86_64-linux-gnu/bits/string_fortified.h:29:10: ‘__builtin_memcpy’ offset [0, 31] is out of the bounds [0, 0] [-Werror=array-bounds]
    29 |   return __builtin___memcpy_chk (__dest, __src, __len,
       |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    30 |                                  __glibc_objsize0 (__dest));
       |                                  ~~~~~~~~~~~~~~~~~~~~~~~~~~

Previously in commit_tree_extended the structure was:

	while (parents) {
		struct commit *parent = pop_commit(&parents);
		strbuf_addf(&buffer, "parent %s\n",
			    oid_to_hex(&parent->object.oid));
	}

brian had changed it to:

	nparents = commit_list_count(parents);
	parent_buf = xcalloc(nparents, sizeof(*parent_buf));
	for (i = 0; i < nparents; i++) {
		struct commit *parent = pop_commit(&parents);
		oidcpy(&parent_buf[i], &parent->object.oid);
	}

Which is perfectly sound code.

I changed the structure of the loop to:

	nparents = commit_list_count(parents);
	parent_buf = xcalloc(nparents, sizeof(*parent_buf));
	i = 0;
	while (parents) {
		struct commit *parent = pop_commit(&parents);
		oidcpy(&parent_buf[i++], &parent->object.oid);
	}

And the "array-bounds" warning had no problems with the code.
So it looks like the error was actually that array-bounds thought
there was a potential NULL pointer dereference at which point
it would not have array bounds, and then it complained about
the array bounds, instead of the NULL pointer dereference.

I am going to fix the patch.  If array-bounds causes further
problems you may want to think about disabling it.

Eric

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 01/30] object-file-convert: Stubs for converting from one object format to another
  2023-09-27 20:42   ` Eric Sunshine
@ 2023-10-02  1:22     ` Eric W. Biederman
  2023-10-02  2:27       ` Eric Sunshine
  0 siblings, 1 reply; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  1:22 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Junio C Hamano, git, brian m. carlson, Eric W. Biederman

Eric Sunshine <sunshine@sunshineco.com> writes:

> On Wed, Sep 27, 2023 at 3:55 PM Eric W. Biederman <ebiederm@gmail.com> wrote:
>> Two basic functions are provided:
>> - convert_object_file Takes an object file it's type and hash algorithm
>>   and converts it into the equivalent object file that would
>>   have been generated with hash algorithm "to".
>>
>>   For blob objects there is no converstion to be done and it is an
>>   error to use this function on them.
>
> s/converstion/conversion/
>
>>   For commit, tree, and tag objects embedded oids are replaced by the
>>   oids of the objects they refer to with those objects and their
>>   object ids reencoded in with the hash algorithm "to".  Signatures
>>   are rearranged so that they remain valid after the object has
>>   been reencoded.
>>
>> - repo_oid_to_algop which takes an oid that refers to an object file
>>   and returns the oid of the equavalent object file generated
>>   with the target hash algorithm.
>
> s/equavalent/equivalent/
>
>> The pair of files object-file-convert.c and object-file-convert.h are
>> introduced to hold as much of this logic as possible to keep this
>> conversion logic cleanly separated from everything else and in the
>> hopes that someday the code will be clean enough git can support
>> compiling out support for sha1 and the various conversion functions.
>>
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>
> Just some minor comments below, many of which are subjective
> style-related observations, thus not necessarily actionable, but also
> one or two legitimate questions.
>
>> diff --git a/object-file-convert.c b/object-file-convert.c
>> @@ -0,0 +1,57 @@
>> +int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
>> +                     const struct git_hash_algo *to, struct object_id *dest)
>> +{
>> +       /*
>> +        * If the source alogirthm is not set, then we're using the
>> +        * default hash algorithm for that object.
>> +        */
>
> s/alogirthm/algorithm/
>
>> +       const struct git_hash_algo *from =
>> +               src->algo ? &hash_algos[src->algo] : repo->hash_algo;
>> +
>> +       if (from == to) {
>> +               if (src != dest)
>> +                       oidcpy(dest, src);
>> +               return 0;
>> +       }
>> +       return -1;
>> +}
>
> On this project, we usually get the simple cases out of the way first,
> which often reduces the indentation level, making the code easier to
> digest at a glance. So, it would be typical to write this as:
>
>     if (from != to)
>         return -1
>     if (src != dest)
>         oidcpy(dest, src);
>     return 0;
>
> or even:
>
>     if (from != to)
>         return -1
>     if (src == dest)
>         return 0;
>     oidcpy(dest, src);
>     return 0;
>
> This way, for instance, the reader doesn't get to the end of the
> function and then have to scan backward to understand the condition of
> the `return -1`.

The "return -1" is there only because it is a stub, and it is there
where the rest of the code needs to go.

As for simple cases the "if (from == to)" case is a simple case I am
getting out of the way.  It unfortunately is cluttered by the fact
that "oidcpy(&oid, &oid)" is not valid so it has to guard the copy.

if (from == to) {
	if (src == dest)
        	return 0;
        oidcpy(dest, src);
        return 0;
}

Could be used but is wordier.  And duplicates the return code
for the same case so I am not enthusiastic about it.


>> +int convert_object_file(struct strbuf *outbuf,
>> +                       const struct git_hash_algo *from,
>> +                       const struct git_hash_algo *to,
>> +                       const void *buf, size_t len,
>> +                       enum object_type type,
>> +                       int gentle)
>> +{
>> +       int ret;
>> +
>> +       /* Don't call this function when no conversion is necessary */
>> +       if ((from == to) || (type == OBJ_BLOB))
>> +               die("Refusing noop object file conversion");
>
> Several comments...
>
> Style: we usually reduce the noise level by dropping the extra parentheses:
>
>     if (from == to || type == OBJ_BLOB)

I honestly can not be confident of C code that does that.

The precedence of the operators in C has been wrong for longer than I
have been programming, and I can never remember exactly how the
precedence is wrong.  So for the last 30 years I have been adding enough
parenthesis that I don't have to remember.

> Does this condition represent a programming error or a runtime error
> triggerable by some input? If a programming error, then use BUG()
> rather than die().

Agreed BUG would be better there.

> If a triggerable runtime error, then...
>
> * start user-facing messages with lowercase rather than capitalized word
>
> * make the user-facing message localizable so readers of other
> languages can digest it
>
>     die(_("refusing do-nothing object conversion"));
>
> On the other hand, don't make BUG() messages localizable.
>
>> +       switch (type) {
>> +       case OBJ_COMMIT:
>> +       case OBJ_TREE:
>> +       case OBJ_TAG:
>> +       default:
>> +               /* Not implemented yet, so fail. */
>> +               ret = -1;
>> +               break;
>> +       }
>> +       if (!ret)
>> +               return 0;
>> +       if (gentle) {
>> +               strbuf_release(outbuf);
>> +               return ret;
>> +       }
>
> This function appears to be a mere skeleton at the moment, so it's
> difficult to judge at this point whether you are using `outbuf` as a
> bag of bytes or as a legitimate string container. If the latter, then
> the API may be reasonable, but if you're using it as a bag-of-bytes,
> then it feels like you're leaking an implementation detail into the
> API.

It is a string that represents the entire object.

It might be arguable if tree objects are text given that trees represent
oids in binary, but for tag and commit objects they are definitely one
big text string.

This is an implementation detail that makes the code simpler, and less
error prone.

I have not encountered anything where a string buffer would not be
a reasonable fit.

>> +       die(_("Failed to convert object from %s to %s"),
>> +               from->name, to->name);
>
> s/Failed/failed/

I don't understand wanting to start a sentence with a lower case letter.
Can you explain?

> For people trying to diagnose this problem, would it be helpful to
> present more information about the failed conversion, such as object
> type and perhaps even its OID?

I expect some of that context will come from the conversion functions
themselves.  Those messages are almost certain to give the type
information one way or another because they are type specific.

That said it requires some version of corrupt repository to reach this
error.  Either missing mapping tables, or a corrupt object.

So I don't know how much it matters to get this perfect the first
time.  We can improve the error message we gain experience.

I would agree that including the OID makes sense.  Unfortunately the
code does not have the OID at this point.

>> diff --git a/object-file-convert.h b/object-file-convert.h
>> @@ -0,0 +1,24 @@
>> +int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
>> +                     const struct git_hash_algo *to, struct object_id *dest);
>
> I suppose the function name is pretty much self-explanatory to those
> familiar with the underlying concepts, but it might still be helpful
> to add a comment explaining what the function does.

I could use words that repeat what is in the function signature.
But I don't think I could add anything.

I would have to say something like:

Look up the oid that an equivalent object would have in a repository
whose object format is "to".

Is that helpful?

Eric

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 21/30] repository: Implement extensions.compatObjectFormat
  2023-09-27 21:39   ` Junio C Hamano
  2023-09-28 20:18     ` Junio C Hamano
@ 2023-10-02  1:31     ` Eric W. Biederman
  1 sibling, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  1:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric W. Biederman

Junio C Hamano <gitster@pobox.com> writes:

> "Eric W. Biederman" <ebiederm@gmail.com> writes:
>
>> Subject: Re: [PATCH 21/30] repository: Implement extensions.compatObjectFormat
>
> "Implement" -> "implement" (many other patches share the same
> problem, none of which I fixed while queueing).

Why shouldn't sentences begin with a capital letter?

I agree it isn't a very extensive sentence just two words but it is a
complete thought, and thus is a sentence.

There is great value in uniformity in a project so I will make the
change, but it seems very weird to me.

Eric

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 03/30] object-names: Support input of oids in any supported hash
  2023-09-27 23:29   ` Eric Sunshine
@ 2023-10-02  1:54     ` Eric W. Biederman
  0 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  1:54 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Junio C Hamano, git, brian m. carlson, Eric W. Biederman

Eric Sunshine <sunshine@sunshineco.com> writes:

> On Wed, Sep 27, 2023 at 3:56 PM Eric W. Biederman <ebiederm@gmail.com> wrote:
>> Support short oids encoded in any algorithm, while ensuring enough of
>> the oid is specified to disambiguate between all of the oids in the
>> repository encoded in any algorithm.
>>
>> By default have the code continue to only accept oids specified in the
>> storage hash algorithm of the repository, but when something is
>> ambiguous display all of the possible oids from any oid encoding.
>>
>> A new flag is added GET_OID_HASH_ANY that when supplied causes the
>> code to accept oids specified in any hash algorithm, and to return the
>> oids that were resolved.
>>
>> This implements the functionality that allows both SHA-1 and SHA-256
>> object names, from the "Object names on the command line" section of
>> the hash function transition document.
>>
>> Care is taken in get_short_oid so that when the result is ambiguous
>> the output remains the same of GIT_OID_HASH_ANY was not supplied.
>
> s/of/as if/

Actually s/of/if/

Thank you for catching that.  When reviewing this to understand what I
was trying to say I found a bug.  The function repo_for_each_abbrev was
passing algo to init_object_disambiguation to properly initialize
ds.bin_pfx.algo, and then was resetting ds.bin_pfx.algo to GIT_HASH_ANY.
Oops!

>> If GET_OID_HASH_ANY was supplied objects of any hash algorithm
>> that match the prefix are displayed.
>>
>> This required updating repo_for_each_abbrev to give it a parameter
>> so that it knows to look at all hash algorithms.
>>
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>> diff --git a/object-name.c b/object-name.c
>> @@ -49,6 +50,7 @@ struct disambiguate_state {
>>  static void update_candidates(struct disambiguate_state *ds, const struct object_id *current)
>>  {
>> +       /* The hash algorithm of the current has already been filtered */
>
> Is there a word missing after "current"?

No.  I forgot to remove "the" before current.

>> @@ -503,8 +516,13 @@ static int sort_ambiguous(const void *a, const void *b, void *ctx)
>> -       if (a_type == b_type)
>> -               return oidcmp(a, b);
>> +       if (a_type == b_type) {
>> +               /* Is the hash algorithm the same? */
>> +               if (a->algo == b->algo)
>> +                       return oidcmp(a, b);
>> +               else
>> +                       return a->algo > b->algo ? 1 : -1;
>> +       }
>
> Nit: unnecessary comment ("Is the hash algorithm...") is merely
> repeating what the code itself already says clearly enough

Fair enough.

>
>> @@ -553,6 +575,7 @@ static enum get_oid_result get_short_oid(struct repository *r,
>>         else
>>                 ds.fn = default_disambiguate_hint;
>>
>> +
>>         find_short_object_filename(&ds);
>
> Nit: unnecessary new blank line

Naughty thing how did that sneak in there ;)


Eric

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 05/30] loose: add a mapping between SHA-1 and SHA-256 for loose objects
  2023-09-28  7:14   ` Eric Sunshine
@ 2023-10-02  2:11     ` Eric W. Biederman
  2023-10-02  2:36       ` Eric Sunshine
  0 siblings, 1 reply; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  2:11 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Junio C Hamano, git, brian m. carlson, Eric W . Biederman

Eric Sunshine <sunshine@sunshineco.com> writes:

> On Wed, Sep 27, 2023 at 3:56 PM Eric W. Biederman <ebiederm@gmail.com> wrote:
>> As part of the transition plan, we'd like to add a file in the .git
>> directory that maps loose objects between SHA-1 and SHA-256.  Let's
>> implement the specification in the transition plan and store this data
>> on a per-repository basis in struct repository.
>>
>> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
>> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
>> ---
>> diff --git a/loose.c b/loose.c
>> @@ -0,0 +1,245 @@
>> +static int load_one_loose_object_map(struct repository *repo, struct object_directory *dir)
>> +{
>> +       struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT;
>> +       FILE *fp;
>> +
>> +       if (!dir->loose_map)
>> +               loose_object_map_init(&dir->loose_map);
>> +
>> +       insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->empty_tree, repo->compat_hash_algo->empty_tree);
>> +       insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->empty_tree, repo->hash_algo->empty_tree);
>> +
>> +       insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->empty_blob, repo->compat_hash_algo->empty_blob);
>> +       insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->empty_blob, repo->hash_algo->empty_blob);
>> +
>> +       insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->null_oid, repo->compat_hash_algo->null_oid);
>> +       insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->null_oid, repo->hash_algo->null_oid);
>> +
>> +       strbuf_git_common_path(&path, repo, "objects/loose-object-idx");
>> +       fp = fopen(path.buf, "rb");
>> +       if (!fp)
>> +               return 0;
>
> This early return leaks `path`. At minimum, call
> `strbuf_release(&path)` before returning.

yep.  Fixed.

>> +       errno = 0;
>> +       if (strbuf_getwholeline(&buf, fp, '\n') || strcmp(buf.buf, loose_object_header))
>> +               goto err;
>> +       while (!strbuf_getline_lf(&buf, fp)) {
>> +               const char *p;
>> +               struct object_id oid, compat_oid;
>> +               if (parse_oid_hex_algop(buf.buf, &oid, &p, repo->hash_algo) ||
>> +                   *p++ != ' ' ||
>> +                   parse_oid_hex_algop(p, &compat_oid, &p, repo->compat_hash_algo) ||
>> +                   p != buf.buf + buf.len)
>> +                       goto err;
>> +               insert_oid_pair(dir->loose_map->to_compat, &oid, &compat_oid);
>> +               insert_oid_pair(dir->loose_map->to_storage, &compat_oid, &oid);
>> +       }
>> +
>> +       strbuf_release(&buf);
>> +       strbuf_release(&path);
>> +       return errno ? -1 : 0;
>> +err:
>> +       strbuf_release(&buf);
>> +       strbuf_release(&path);
>> +       return -1;
>> +}
>> +
>> +int repo_write_loose_object_map(struct repository *repo)
>> +{
>> +       kh_oid_map_t *map = repo->objects->odb->loose_map->to_compat;
>> +       struct lock_file lock;
>> +       int fd;
>> +       khiter_t iter;
>> +       struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT;
>> +
>> +       if (!should_use_loose_object_map(repo))
>> +               return 0;
>> +
>> +       strbuf_git_common_path(&path, repo, "objects/loose-object-idx");
>> +       fd = hold_lock_file_for_update_timeout(&lock, path.buf, LOCK_DIE_ON_ERROR, -1);
>> +       iter = kh_begin(map);
>> +       if (write_in_full(fd, loose_object_header, strlen(loose_object_header)) < 0)
>> +               goto errout;
>> +
>> +       for (; iter != kh_end(map); iter++) {
>> +               if (kh_exist(map, iter)) {
>> +                       if (oideq(&kh_key(map, iter), the_hash_algo->empty_tree) ||
>> +                           oideq(&kh_key(map, iter), the_hash_algo->empty_blob))
>> +                               continue;
>> +                       strbuf_addf(&buf, "%s %s\n", oid_to_hex(&kh_key(map, iter)), oid_to_hex(kh_value(map, iter)));
>> +                       if (write_in_full(fd, buf.buf, buf.len) < 0)
>> +                               goto errout;
>> +                       strbuf_reset(&buf);
>> +               }
>> +       }
>
> Nit: If you call strbuf_reset() immediately before strbuf_addf(), then
> you save the reader of the code the effort of having to scan backward
> through the function to verify that `buf` is empty the first time
> through the loop.

I am actually perplexed why the code works this way.

My gut says we should create the entire buffer in memory and then
write it to disk all in a single system call, and not perform
this line buffering.

Doing that would remove the need for the strbuf_reset entirely,
and would just require the buffer to be primed with the
loose_object_header.

But what is there works and I will leave it for now.

It isn't a bug, and it can be improved with a follow up patch.

>> +       strbuf_release(&buf);
>> +       if (commit_lock_file(&lock) < 0) {
>> +               error_errno(_("could not write loose object index %s"), path.buf);
>> +               strbuf_release(&path);
>> +               return -1;
>> +       }
>> +       strbuf_release(&path);
>> +       return 0;
>> +errout:
>> +       rollback_lock_file(&lock);
>> +       strbuf_release(&buf);
>> +       error_errno(_("failed to write loose object index %s\n"), path.buf);
>> +       strbuf_release(&path);
>> +       return -1;
>> +}
>> +
>> +int repo_loose_object_map_oid(struct repository *repo,
>> +                             const struct object_id *src,
>> +                             const struct git_hash_algo *to,
>> +                             struct object_id *dest)
>> +
>> +{
>
> Style: unnecessary blank line before opening `{`

Yep. Fixed.
>
>> +       struct object_directory *dir;
>> +       kh_oid_map_t *map;
>> +       khiter_t pos;
>> +
>> +       for (dir = repo->objects->odb; dir; dir = dir->next) {
>> +               struct loose_object_map *loose_map = dir->loose_map;
>> +               if (!loose_map)
>> +                       continue;
>> +               map = (to == repo->compat_hash_algo) ?
>> +                       loose_map->to_compat :
>> +                       loose_map->to_storage;
>> +               pos = kh_get_oid_map(map, *src);
>> +               if (pos < kh_end(map)) {
>> +                       oidcpy(dest, kh_value(map, pos));
>> +                       return 0;
>> +               }
>> +       }
>> +       return -1;
>> +}

Eric

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 01/30] object-file-convert: Stubs for converting from one object format to another
  2023-10-02  1:22     ` Eric W. Biederman
@ 2023-10-02  2:27       ` Eric Sunshine
  0 siblings, 0 replies; 104+ messages in thread
From: Eric Sunshine @ 2023-10-02  2:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Junio C Hamano, git, brian m. carlson, Eric W. Biederman

On Sun, Oct 1, 2023 at 9:22 PM Eric W. Biederman <ebiederm@gmail.com> wrote:
> Eric Sunshine <sunshine@sunshineco.com> writes:
> > On Wed, Sep 27, 2023 at 3:55 PM Eric W. Biederman <ebiederm@gmail.com> wrote:
> >> +       die(_("Failed to convert object from %s to %s"),
> >> +               from->name, to->name);
> >
> > s/Failed/failed/
>
> I don't understand wanting to start a sentence with a lower case letter.
> Can you explain?

Consistency with most of the rest of the messages emitted by Git.
CodingGuidelines call for lowercase and omission of the full-stop
(period).

The choice, of course, is subjective, but these are the guidelines
upon which the project has settled for better or worse. (You can
certainly find older code, which predates the guideline, using
capitalized messages, but it's good to adhere to the guideline for new
code if possible.)

> >> +int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
> >> +                     const struct git_hash_algo *to, struct object_id *dest);
> >
> > I suppose the function name is pretty much self-explanatory to those
> > familiar with the underlying concepts, but it might still be helpful
> > to add a comment explaining what the function does.
>
> I could use words that repeat what is in the function signature.
> But I don't think I could add anything.
>
> I would have to say something like:
>
> Look up the oid that an equivalent object would have in a repository
> whose object format is "to".
>
> Is that helpful?

That would help clarify the function a bit for me. Explaining the
function's return value could also be useful.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 05/30] loose: add a mapping between SHA-1 and SHA-256 for loose objects
  2023-10-02  2:11     ` Eric W. Biederman
@ 2023-10-02  2:36       ` Eric Sunshine
  0 siblings, 0 replies; 104+ messages in thread
From: Eric Sunshine @ 2023-10-02  2:36 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Junio C Hamano, git, brian m. carlson, Eric W . Biederman

On Sun, Oct 1, 2023 at 10:11 PM Eric W. Biederman <ebiederm@gmail.com> wrote:
> Eric Sunshine <sunshine@sunshineco.com> writes:
> > On Wed, Sep 27, 2023 at 3:56 PM Eric W. Biederman <ebiederm@gmail.com> wrote:
> >> +       for (; iter != kh_end(map); iter++) {
> >> +               if (kh_exist(map, iter)) {
> >> +                       if (oideq(&kh_key(map, iter), the_hash_algo->empty_tree) ||
> >> +                           oideq(&kh_key(map, iter), the_hash_algo->empty_blob))
> >> +                               continue;
> >> +                       strbuf_addf(&buf, "%s %s\n", oid_to_hex(&kh_key(map, iter)), oid_to_hex(kh_value(map, iter)));
> >> +                       if (write_in_full(fd, buf.buf, buf.len) < 0)
> >> +                               goto errout;
> >> +                       strbuf_reset(&buf);
> >> +               }
> >> +       }
> >
> > Nit: If you call strbuf_reset() immediately before strbuf_addf(), then
> > you save the reader of the code the effort of having to scan backward
> > through the function to verify that `buf` is empty the first time
> > through the loop.
>
> I am actually perplexed why the code works this way.
>
> My gut says we should create the entire buffer in memory and then
> write it to disk all in a single system call, and not perform
> this line buffering.

I think I had a similar question/thought while reading the patch but...

> Doing that would remove the need for the strbuf_reset entirely,
> and would just require the buffer to be primed with the
> loose_object_header.
>
> But what is there works and I will leave it for now.

... came to this same conclusion.

> It isn't a bug, and it can be improved with a follow up patch.

Exactly.

I haven't thought through the consequences, but perhaps brian was
worried about the memory footprint of buffering the whole thing before
writing to disk.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v2 00/30] initial support for multiple hash functions
  2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
                   ` (30 preceding siblings ...)
  2023-09-27 21:31 ` [PATCH 00/30] Initial support for multiple hash functions Junio C Hamano
@ 2023-10-02  2:39 ` Eric W. Biederman
  2023-10-02  2:40   ` [PATCH v2 01/30] object-file-convert: stubs for converting from one object format to another Eric W. Biederman
                     ` (31 more replies)
  31 siblings, 32 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  2:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

This addresses all of the known test failures from v1 of this set of
changes.  In particular I have reworked commit_tree_extended which
was flagged by smatch, -Werror=array-bounds, and the leak detector.

One functional bug was fixed in repo_for_each_abbrev where it was
mistakenly displaying too many ambiguous oids.

I am posting this so that people review and testing of this patchset
won't be distracted by the known and fixed issues.

A key part of the hash function transition plan is a way that a single
git repository can inter-operate with git repositories whose storage
hash function is SHA-1 and git repositories whose storage hash function
is SHA-256.

This interoperability can defined in terms of two repositories one whose
storage hash function is SHA-1 and another whose storage hash function
is SHA-256.  Those two repositories receive exactly the same objects,
but they store them in different but equivalent ways.

For a repository that has one storage hash function to inter-operate
with a repository that has a different storage hash function requires
the first repository to be able produce it's objects as if they were
stored in the second hash function.

This series of changes focuses on implementing the pieces that allow
a repository that uses one storage hash function to produce the objects
that would have been stored with a second storage hash function.

The final patch in this series is the addition of a test that creates
two repositories one that uses SHA-1 as it's storage hash function
and the other that uses SHA-256 as it's storage hash function.
Identical operations are performed on the two repositories, and their
compatibility objects are compared to verify they are the same.
AKA the SHA-1 repository on the fly generates the objects store in
the SHA-256 repository, and the SHA-256 repository on the fly generates
the objects that are stored in the SHA-1 repository.

There are two fundamental technologies for enabling this.
- The ability to convert a stored object into the object the
  other repository would have stored.
- The ability to remember a mapping between SHA-1 and SHA-256 oids
  of equivalent objects.

With such technologies it is very easy to implement user facing changes.
To avoid locking git into poor decisions by accident I have done my best
to minimize the user facing changes, while still building the internal
infrastructure that is needed for interoperability.

All of this work is inspired by earlier work on interoperability by
"brian m. carlson" and some of the key pieces of code are still his.

To get to the point where I can test if a SHA-1 and a SHA-256 repository
can on the fly generate each other, I have made some small user-facing
changes.

git rev-parse now supports --output-object-format as a way to query
the internal mapping tables between oids and report the equivalent
oid of the other format.

git cat-file when given a oid that does not match the repositories
storage format will now attempt to find the oids equivalent object that
is stored in the repository and if found dynamically generate the object
that would have been stored in a repository with a different storage
hash function and display the object.

An additional file loose-object-index will be stored in ".git/objects/".

An additional option "extensions.compatObjectFormat" is implemented,
that generates and stores mappings between the oids of objects stored in
the repository and oids of the equivalent objects that would be stored
in a repository show storage format was extensions.compatObjectFormat.

Eric W. Biederman (23):
      object-file-convert: stubs for converting from one object format to another
      oid-array: teach oid-array to handle multiple kinds of oids
      object-names: support input of oids in any supported hash
      repository: add a compatibility hash algorithm
      loose: compatibilty short name support
      object-file: update the loose object map when writing loose objects
      object-file: add a compat_oid_in parameter to write_object_file_flags
      commit: convert mergetag before computing the signature of a commit
      commit: export add_header_signature to support handling signatures on tags
      tag: sign both hashes
      object: factor out parse_mode out of fast-import and tree-walk into in object.h
      object-file-convert: don't leak when converting tag objects
      object-file-convert: convert commits that embed signed tags
      object-file: update object_info_extended to reencode objects
      rev-parse: add an --output-object-format parameter
      builtin/cat-file: let the oid determine the output algorithm
      tree-walk: init_tree_desc take an oid to get the hash algorithm
      object-file: handle compat objects in check_object_signature
      builtin/ls-tree: let the oid determine the output algorithm
      test-lib: compute the compatibility hash so tests may use it
      t1006: rename sha1 to oid
      t1006: test oid compatibility with cat-file
      t1016-compatObjectFormat: add tests to verify the conversion between objects

brian m. carlson (7):
      loose: add a mapping between SHA-1 and SHA-256 for loose objects
      commit: write commits for both hashes
      cache: add a function to read an OID of a specific algorithm
      object-file-convert: add a function to convert trees between algorithms
      object-file-convert: convert tag objects when writing
      object-file-convert: convert commit objects when writing
      repository: implement extensions.compatObjectFormat

 Documentation/config/extensions.txt |  12 ++
 Documentation/git-rev-parse.txt     |  12 ++
 Makefile                            |   3 +
 archive.c                           |   3 +-
 builtin/am.c                        |   6 +-
 builtin/cat-file.c                  |  12 +-
 builtin/checkout.c                  |   8 +-
 builtin/clone.c                     |   2 +-
 builtin/commit.c                    |   2 +-
 builtin/fast-import.c               |  18 +-
 builtin/grep.c                      |   8 +-
 builtin/ls-tree.c                   |   5 +-
 builtin/merge.c                     |   3 +-
 builtin/pack-objects.c              |   6 +-
 builtin/read-tree.c                 |   2 +-
 builtin/rev-parse.c                 |  25 ++-
 builtin/stash.c                     |   5 +-
 builtin/tag.c                       |  45 ++++-
 cache-tree.c                        |   4 +-
 commit.c                            | 221 ++++++++++++++++-----
 commit.h                            |   1 +
 delta-islands.c                     |   2 +-
 diff-lib.c                          |   2 +-
 fsck.c                              |   6 +-
 hash-ll.h                           |   1 +
 hash.h                              |   9 +-
 http-push.c                         |   2 +-
 list-objects.c                      |   2 +-
 loose.c                             | 259 ++++++++++++++++++++++++
 loose.h                             |  22 +++
 match-trees.c                       |   4 +-
 merge-ort.c                         |  11 +-
 merge-recursive.c                   |   2 +-
 merge.c                             |   3 +-
 object-file-convert.c               | 277 ++++++++++++++++++++++++++
 object-file-convert.h               |  24 +++
 object-file.c                       | 212 ++++++++++++++++++--
 object-name.c                       |  46 +++--
 object-name.h                       |   3 +-
 object-store-ll.h                   |   7 +-
 object.c                            |   2 +
 object.h                            |  18 ++
 oid-array.c                         |  12 +-
 pack-bitmap-write.c                 |   2 +-
 packfile.c                          |   3 +-
 reflog.c                            |   2 +-
 repository.c                        |  14 ++
 repository.h                        |   4 +
 revision.c                          |   4 +-
 setup.c                             |  22 +++
 setup.h                             |   1 +
 t/helper/test-delete-gpgsig.c       |  62 ++++++
 t/helper/test-tool.c                |   1 +
 t/helper/test-tool.h                |   1 +
 t/t1006-cat-file.sh                 | 379 +++++++++++++++++++++---------------
 t/t1016-compatObjectFormat.sh       | 281 ++++++++++++++++++++++++++
 t/t1016/gpg                         |   2 +
 t/test-lib-functions.sh             |  17 +-
 tree-walk.c                         |  58 +++---
 tree-walk.h                         |   7 +-
 tree.c                              |   2 +-
 walker.c                            |   2 +-
 62 files changed, 1844 insertions(+), 349 deletions(-)

Eric

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v2 01/30] object-file-convert: stubs for converting from one object format to another
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
@ 2023-10-02  2:40   ` Eric W. Biederman
  2024-02-08  8:23     ` Linus Arver
  2024-02-15 11:21     ` Patrick Steinhardt
  2023-10-02  2:40   ` [PATCH v2 02/30] oid-array: teach oid-array to handle multiple kinds of oids Eric W. Biederman
                     ` (30 subsequent siblings)
  31 siblings, 2 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  2:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

Two basic functions are provided:
- convert_object_file Takes an object file it's type and hash algorithm
  and converts it into the equivalent object file that would
  have been generated with hash algorithm "to".

  For blob objects there is no conversation to be done and it is an
  error to use this function on them.

  For commit, tree, and tag objects embedded oids are replaced by the
  oids of the objects they refer to with those objects and their
  object ids reencoded in with the hash algorithm "to".  Signatures
  are rearranged so that they remain valid after the object has
  been reencoded.

- repo_oid_to_algop which takes an oid that refers to an object file
  and returns the oid of the equivalent object file generated
  with the target hash algorithm.

The pair of files object-file-convert.c and object-file-convert.h are
introduced to hold as much of this logic as possible to keep this
conversion logic cleanly separated from everything else and in the
hopes that someday the code will be clean enough git can support
compiling out support for sha1 and the various conversion functions.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 Makefile              |  1 +
 object-file-convert.c | 57 +++++++++++++++++++++++++++++++++++++++++++
 object-file-convert.h | 24 ++++++++++++++++++
 3 files changed, 82 insertions(+)
 create mode 100644 object-file-convert.c
 create mode 100644 object-file-convert.h

diff --git a/Makefile b/Makefile
index 577630936535..f7e824f25cda 100644
--- a/Makefile
+++ b/Makefile
@@ -1073,6 +1073,7 @@ LIB_OBJS += notes-cache.o
 LIB_OBJS += notes-merge.o
 LIB_OBJS += notes-utils.o
 LIB_OBJS += notes.o
+LIB_OBJS += object-file-convert.o
 LIB_OBJS += object-file.o
 LIB_OBJS += object-name.o
 LIB_OBJS += object.o
diff --git a/object-file-convert.c b/object-file-convert.c
new file mode 100644
index 000000000000..4777aba83636
--- /dev/null
+++ b/object-file-convert.c
@@ -0,0 +1,57 @@
+#include "git-compat-util.h"
+#include "gettext.h"
+#include "strbuf.h"
+#include "repository.h"
+#include "hash-ll.h"
+#include "object.h"
+#include "object-file-convert.h"
+
+int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
+		      const struct git_hash_algo *to, struct object_id *dest)
+{
+	/*
+	 * If the source algorithm is not set, then we're using the
+	 * default hash algorithm for that object.
+	 */
+	const struct git_hash_algo *from =
+		src->algo ? &hash_algos[src->algo] : repo->hash_algo;
+
+	if (from == to) {
+		if (src != dest)
+			oidcpy(dest, src);
+		return 0;
+	}
+	return -1;
+}
+
+int convert_object_file(struct strbuf *outbuf,
+			const struct git_hash_algo *from,
+			const struct git_hash_algo *to,
+			const void *buf, size_t len,
+			enum object_type type,
+			int gentle)
+{
+	int ret;
+
+	/* Don't call this function when no conversion is necessary */
+	if ((from == to) || (type == OBJ_BLOB))
+		BUG("Refusing noop object file conversion");
+
+	switch (type) {
+	case OBJ_COMMIT:
+	case OBJ_TREE:
+	case OBJ_TAG:
+	default:
+		/* Not implemented yet, so fail. */
+		ret = -1;
+		break;
+	}
+	if (!ret)
+		return 0;
+	if (gentle) {
+		strbuf_release(outbuf);
+		return ret;
+	}
+	die(_("Failed to convert object from %s to %s"),
+		from->name, to->name);
+}
diff --git a/object-file-convert.h b/object-file-convert.h
new file mode 100644
index 000000000000..a4f802aa8eea
--- /dev/null
+++ b/object-file-convert.h
@@ -0,0 +1,24 @@
+#ifndef OBJECT_CONVERT_H
+#define OBJECT_CONVERT_H
+
+struct repository;
+struct object_id;
+struct git_hash_algo;
+struct strbuf;
+#include "object.h"
+
+int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
+		      const struct git_hash_algo *to, struct object_id *dest);
+
+/*
+ * Convert an object file from one hash algorithm to another algorithm.
+ * Return -1 on failure, 0 on success.
+ */
+int convert_object_file(struct strbuf *outbuf,
+			const struct git_hash_algo *from,
+			const struct git_hash_algo *to,
+			const void *buf, size_t len,
+			enum object_type type,
+			int gentle);
+
+#endif /* OBJECT_CONVERT_H */
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 02/30] oid-array: teach oid-array to handle multiple kinds of oids
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
  2023-10-02  2:40   ` [PATCH v2 01/30] object-file-convert: stubs for converting from one object format to another Eric W. Biederman
@ 2023-10-02  2:40   ` Eric W. Biederman
  2024-02-13  8:16     ` Linus Arver
                       ` (2 more replies)
  2023-10-02  2:40   ` [PATCH v2 03/30] object-names: support input of oids in any supported hash Eric W. Biederman
                     ` (29 subsequent siblings)
  31 siblings, 3 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  2:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

While looking at how to handle input of both SHA-1 and SHA-256 oids in
get_oid_with_context, I realized that the oid_array in
repo_for_each_abbrev might have more than one kind of oid stored in it
simultaneously.

Update to oid_array_append to ensure that oids added to an oid array
always have an algorithm set.

Update void_hashcmp to first verify two oids use the same hash algorithm
before comparing them to each other.

With that oid-array should be safe to use with different kinds of
oids simultaneously.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 oid-array.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/oid-array.c b/oid-array.c
index 8e4717746c31..1f36651754ed 100644
--- a/oid-array.c
+++ b/oid-array.c
@@ -6,12 +6,20 @@ void oid_array_append(struct oid_array *array, const struct object_id *oid)
 {
 	ALLOC_GROW(array->oid, array->nr + 1, array->alloc);
 	oidcpy(&array->oid[array->nr++], oid);
+	if (!oid->algo)
+		oid_set_algo(&array->oid[array->nr - 1], the_hash_algo);
 	array->sorted = 0;
 }
 
-static int void_hashcmp(const void *a, const void *b)
+static int void_hashcmp(const void *va, const void *vb)
 {
-	return oidcmp(a, b);
+	const struct object_id *a = va, *b = vb;
+	int ret;
+	if (a->algo == b->algo)
+		ret = oidcmp(a, b);
+	else
+		ret = a->algo > b->algo ? 1 : -1;
+	return ret;
 }
 
 void oid_array_sort(struct oid_array *array)
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 03/30] object-names: support input of oids in any supported hash
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
  2023-10-02  2:40   ` [PATCH v2 01/30] object-file-convert: stubs for converting from one object format to another Eric W. Biederman
  2023-10-02  2:40   ` [PATCH v2 02/30] oid-array: teach oid-array to handle multiple kinds of oids Eric W. Biederman
@ 2023-10-02  2:40   ` Eric W. Biederman
  2024-02-13  9:33     ` Linus Arver
  2024-02-15 11:21     ` Patrick Steinhardt
  2023-10-02  2:40   ` [PATCH v2 04/30] repository: add a compatibility hash algorithm Eric W. Biederman
                     ` (28 subsequent siblings)
  31 siblings, 2 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  2:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

Support short oids encoded in any algorithm, while ensuring enough of
the oid is specified to disambiguate between all of the oids in the
repository encoded in any algorithm.

By default have the code continue to only accept oids specified in the
storage hash algorithm of the repository, but when something is
ambiguous display all of the possible oids from any accepted oid
encoding.

A new flag is added GET_OID_HASH_ANY that when supplied causes the
code to accept oids specified in any hash algorithm, and to return the
oids that were resolved.

This implements the functionality that allows both SHA-1 and SHA-256
object names, from the "Object names on the command line" section of
the hash function transition document.

Care is taken in get_short_oid so that when the result is ambiguous
the output remains the same if GIT_OID_HASH_ANY was not supplied.  If
GET_OID_HASH_ANY was supplied objects of any hash algorithm that match
the prefix are displayed.

This required updating repo_for_each_abbrev to give it a parameter so
that it knows to look at all hash algorithms.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 builtin/rev-parse.c |  2 +-
 hash-ll.h           |  1 +
 object-name.c       | 46 ++++++++++++++++++++++++++++++++++-----------
 object-name.h       |  3 ++-
 4 files changed, 39 insertions(+), 13 deletions(-)

diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index fde8861ca4e0..43e96765400c 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -882,7 +882,7 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
 				continue;
 			}
 			if (skip_prefix(arg, "--disambiguate=", &arg)) {
-				repo_for_each_abbrev(the_repository, arg,
+				repo_for_each_abbrev(the_repository, arg, the_hash_algo,
 						     show_abbrev, NULL);
 				continue;
 			}
diff --git a/hash-ll.h b/hash-ll.h
index 10d84cc20888..2cfde63ae1cf 100644
--- a/hash-ll.h
+++ b/hash-ll.h
@@ -145,6 +145,7 @@ struct object_id {
 #define GET_OID_RECORD_PATH     0200
 #define GET_OID_ONLY_TO_DIE    04000
 #define GET_OID_REQUIRE_PATH  010000
+#define GET_OID_HASH_ANY      020000
 
 #define GET_OID_DISAMBIGUATORS \
 	(GET_OID_COMMIT | GET_OID_COMMITTISH | \
diff --git a/object-name.c b/object-name.c
index 0bfa29dbbfe9..7dd6e5e47566 100644
--- a/object-name.c
+++ b/object-name.c
@@ -25,6 +25,7 @@
 #include "midx.h"
 #include "commit-reach.h"
 #include "date.h"
+#include "object-file-convert.h"
 
 static int get_oid_oneline(struct repository *r, const char *, struct object_id *, struct commit_list *);
 
@@ -49,6 +50,7 @@ struct disambiguate_state {
 
 static void update_candidates(struct disambiguate_state *ds, const struct object_id *current)
 {
+	/* The hash algorithm of current has already been filtered */
 	if (ds->always_call_fn) {
 		ds->ambiguous = ds->fn(ds->repo, current, ds->cb_data) ? 1 : 0;
 		return;
@@ -134,6 +136,8 @@ static void unique_in_midx(struct multi_pack_index *m,
 {
 	uint32_t num, i, first = 0;
 	const struct object_id *current = NULL;
+	int len = ds->len > ds->repo->hash_algo->hexsz ?
+		ds->repo->hash_algo->hexsz : ds->len;
 	num = m->num_objects;
 
 	if (!num)
@@ -149,7 +153,7 @@ static void unique_in_midx(struct multi_pack_index *m,
 	for (i = first; i < num && !ds->ambiguous; i++) {
 		struct object_id oid;
 		current = nth_midxed_object_oid(&oid, m, i);
-		if (!match_hash(ds->len, ds->bin_pfx.hash, current->hash))
+		if (!match_hash(len, ds->bin_pfx.hash, current->hash))
 			break;
 		update_candidates(ds, current);
 	}
@@ -159,6 +163,8 @@ static void unique_in_pack(struct packed_git *p,
 			   struct disambiguate_state *ds)
 {
 	uint32_t num, i, first = 0;
+	int len = ds->len > ds->repo->hash_algo->hexsz ?
+		ds->repo->hash_algo->hexsz : ds->len;
 
 	if (p->multi_pack_index)
 		return;
@@ -177,7 +183,7 @@ static void unique_in_pack(struct packed_git *p,
 	for (i = first; i < num && !ds->ambiguous; i++) {
 		struct object_id oid;
 		nth_packed_object_id(&oid, p, i);
-		if (!match_hash(ds->len, ds->bin_pfx.hash, oid.hash))
+		if (!match_hash(len, ds->bin_pfx.hash, oid.hash))
 			break;
 		update_candidates(ds, &oid);
 	}
@@ -188,6 +194,10 @@ static void find_short_packed_object(struct disambiguate_state *ds)
 	struct multi_pack_index *m;
 	struct packed_git *p;
 
+	/* Skip, unless oids from the storage hash algorithm are wanted */
+	if (ds->bin_pfx.algo && (&hash_algos[ds->bin_pfx.algo] != ds->repo->hash_algo))
+		return;
+
 	for (m = get_multi_pack_index(ds->repo); m && !ds->ambiguous;
 	     m = m->next)
 		unique_in_midx(m, ds);
@@ -326,11 +336,12 @@ int set_disambiguate_hint_config(const char *var, const char *value)
 
 static int init_object_disambiguation(struct repository *r,
 				      const char *name, int len,
+				      const struct git_hash_algo *algo,
 				      struct disambiguate_state *ds)
 {
 	int i;
 
-	if (len < MINIMUM_ABBREV || len > the_hash_algo->hexsz)
+	if (len < MINIMUM_ABBREV || len > GIT_MAX_HEXSZ)
 		return -1;
 
 	memset(ds, 0, sizeof(*ds));
@@ -357,6 +368,7 @@ static int init_object_disambiguation(struct repository *r,
 	ds->len = len;
 	ds->hex_pfx[len] = '\0';
 	ds->repo = r;
+	ds->bin_pfx.algo = algo ? hash_algo_by_ptr(algo) : GIT_HASH_UNKNOWN;
 	prepare_alt_odb(r);
 	return 0;
 }
@@ -491,9 +503,10 @@ static int repo_collect_ambiguous(struct repository *r UNUSED,
 	return collect_ambiguous(oid, data);
 }
 
-static int sort_ambiguous(const void *a, const void *b, void *ctx)
+static int sort_ambiguous(const void *va, const void *vb, void *ctx)
 {
 	struct repository *sort_ambiguous_repo = ctx;
+	const struct object_id *a = va, *b = vb;
 	int a_type = oid_object_info(sort_ambiguous_repo, a, NULL);
 	int b_type = oid_object_info(sort_ambiguous_repo, b, NULL);
 	int a_type_sort;
@@ -503,8 +516,12 @@ static int sort_ambiguous(const void *a, const void *b, void *ctx)
 	 * Sorts by hash within the same object type, just as
 	 * oid_array_for_each_unique() would do.
 	 */
-	if (a_type == b_type)
-		return oidcmp(a, b);
+	if (a_type == b_type) {
+		if (a->algo == b->algo)
+			return oidcmp(a, b);
+		else
+			return a->algo > b->algo ? 1 : -1;
+	}
 
 	/*
 	 * Between object types show tags, then commits, and finally
@@ -533,8 +550,12 @@ static enum get_oid_result get_short_oid(struct repository *r,
 	int status;
 	struct disambiguate_state ds;
 	int quietly = !!(flags & GET_OID_QUIETLY);
+	const struct git_hash_algo *algo = r->hash_algo;
+
+	if (flags & GET_OID_HASH_ANY)
+		algo = NULL;
 
-	if (init_object_disambiguation(r, name, len, &ds) < 0)
+	if (init_object_disambiguation(r, name, len, algo, &ds) < 0)
 		return -1;
 
 	if (HAS_MULTI_BITS(flags & GET_OID_DISAMBIGUATORS))
@@ -588,7 +609,7 @@ static enum get_oid_result get_short_oid(struct repository *r,
 		if (!ds.ambiguous)
 			ds.fn = NULL;
 
-		repo_for_each_abbrev(r, ds.hex_pfx, collect_ambiguous, &collect);
+		repo_for_each_abbrev(r, ds.hex_pfx, algo, collect_ambiguous, &collect);
 		sort_ambiguous_oid_array(r, &collect);
 
 		if (oid_array_for_each(&collect, show_ambiguous_object, &out))
@@ -610,13 +631,14 @@ static enum get_oid_result get_short_oid(struct repository *r,
 }
 
 int repo_for_each_abbrev(struct repository *r, const char *prefix,
+			 const struct git_hash_algo *algo,
 			 each_abbrev_fn fn, void *cb_data)
 {
 	struct oid_array collect = OID_ARRAY_INIT;
 	struct disambiguate_state ds;
 	int ret;
 
-	if (init_object_disambiguation(r, prefix, strlen(prefix), &ds) < 0)
+	if (init_object_disambiguation(r, prefix, strlen(prefix), algo, &ds) < 0)
 		return -1;
 
 	ds.always_call_fn = 1;
@@ -787,10 +809,12 @@ void strbuf_add_unique_abbrev(struct strbuf *sb, const struct object_id *oid,
 int repo_find_unique_abbrev_r(struct repository *r, char *hex,
 			      const struct object_id *oid, int len)
 {
+	const struct git_hash_algo *algo =
+		oid->algo ? &hash_algos[oid->algo] : r->hash_algo;
 	struct disambiguate_state ds;
 	struct min_abbrev_data mad;
 	struct object_id oid_ret;
-	const unsigned hexsz = r->hash_algo->hexsz;
+	const unsigned hexsz = algo->hexsz;
 
 	if (len < 0) {
 		unsigned long count = repo_approximate_object_count(r);
@@ -826,7 +850,7 @@ int repo_find_unique_abbrev_r(struct repository *r, char *hex,
 
 	find_abbrev_len_packed(&mad);
 
-	if (init_object_disambiguation(r, hex, mad.cur_len, &ds) < 0)
+	if (init_object_disambiguation(r, hex, mad.cur_len, algo, &ds) < 0)
 		return -1;
 
 	ds.fn = repo_extend_abbrev_len;
diff --git a/object-name.h b/object-name.h
index 9ae522307148..064ddc97d1fe 100644
--- a/object-name.h
+++ b/object-name.h
@@ -67,7 +67,8 @@ enum get_oid_result get_oid_with_context(struct repository *repo, const char *st
 
 
 typedef int each_abbrev_fn(const struct object_id *oid, void *);
-int repo_for_each_abbrev(struct repository *r, const char *prefix, each_abbrev_fn, void *);
+int repo_for_each_abbrev(struct repository *r, const char *prefix,
+			 const struct git_hash_algo *algo, each_abbrev_fn, void *);
 
 int set_disambiguate_hint_config(const char *var, const char *value);
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 04/30] repository: add a compatibility hash algorithm
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
                     ` (2 preceding siblings ...)
  2023-10-02  2:40   ` [PATCH v2 03/30] object-names: support input of oids in any supported hash Eric W. Biederman
@ 2023-10-02  2:40   ` Eric W. Biederman
  2024-02-13 10:02     ` Linus Arver
  2024-02-15 11:22     ` Patrick Steinhardt
  2023-10-02  2:40   ` [PATCH v2 05/30] loose: add a mapping between SHA-1 and SHA-256 for loose objects Eric W. Biederman
                     ` (27 subsequent siblings)
  31 siblings, 2 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  2:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

We currently have support for using a full stage 4 SHA-256
implementation.  However, we'd like to support interoperability with
SHA-1 repositories as well.  The transition plan anticipates a
compatibility hash algorithm configuration option that we can use to
implement support for this.  Let's add an element to the repository
structure that indicates the compatibility hash algorithm so we can use
it when we need to consider interoperability between algorithms.

Add a helper function repo_set_compat_hash_algo that takes a
compatibility hash algorithm and sets "repo->compat_hash_algo".  If
GIT_HASH_UNKNOWN is passed as the compatibility hash algorithm
"repo->compat_hash_algo" is set to NULL.

For now, the code results in "repo->compat_hash_algo" always being set
to NULL, but that will change once a configuration option is added.

Inspired-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 repository.c | 8 ++++++++
 repository.h | 4 ++++
 setup.c      | 3 +++
 3 files changed, 15 insertions(+)

diff --git a/repository.c b/repository.c
index a7679ceeaa45..80252b79e93e 100644
--- a/repository.c
+++ b/repository.c
@@ -104,6 +104,13 @@ void repo_set_hash_algo(struct repository *repo, int hash_algo)
 	repo->hash_algo = &hash_algos[hash_algo];
 }
 
+void repo_set_compat_hash_algo(struct repository *repo, int algo)
+{
+	if (hash_algo_by_ptr(repo->hash_algo) == algo)
+		BUG("hash_algo and compat_hash_algo match");
+	repo->compat_hash_algo = algo ? &hash_algos[algo] : NULL;
+}
+
 /*
  * Attempt to resolve and set the provided 'gitdir' for repository 'repo'.
  * Return 0 upon success and a non-zero value upon failure.
@@ -184,6 +191,7 @@ int repo_init(struct repository *repo,
 		goto error;
 
 	repo_set_hash_algo(repo, format.hash_algo);
+	repo_set_compat_hash_algo(repo, GIT_HASH_UNKNOWN);
 	repo->repository_format_worktree_config = format.worktree_config;
 
 	/* take ownership of format.partial_clone */
diff --git a/repository.h b/repository.h
index 5f18486f6465..bf3fc601cc53 100644
--- a/repository.h
+++ b/repository.h
@@ -160,6 +160,9 @@ struct repository {
 	/* Repository's current hash algorithm, as serialized on disk. */
 	const struct git_hash_algo *hash_algo;
 
+	/* Repository's compatibility hash algorithm. */
+	const struct git_hash_algo *compat_hash_algo;
+
 	/* A unique-id for tracing purposes. */
 	int trace2_repo_id;
 
@@ -199,6 +202,7 @@ void repo_set_gitdir(struct repository *repo, const char *root,
 		     const struct set_gitdir_args *extra_args);
 void repo_set_worktree(struct repository *repo, const char *path);
 void repo_set_hash_algo(struct repository *repo, int algo);
+void repo_set_compat_hash_algo(struct repository *repo, int compat_algo);
 void initialize_the_repository(void);
 RESULT_MUST_BE_USED
 int repo_init(struct repository *r, const char *gitdir, const char *worktree);
diff --git a/setup.c b/setup.c
index 18927a847b86..aa8bf5da5226 100644
--- a/setup.c
+++ b/setup.c
@@ -1564,6 +1564,8 @@ const char *setup_git_directory_gently(int *nongit_ok)
 		}
 		if (startup_info->have_repository) {
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
+			repo_set_compat_hash_algo(the_repository,
+						  GIT_HASH_UNKNOWN);
 			the_repository->repository_format_worktree_config =
 				repo_fmt.worktree_config;
 			/* take ownership of repo_fmt.partial_clone */
@@ -1657,6 +1659,7 @@ void check_repository_format(struct repository_format *fmt)
 	check_repository_format_gently(get_git_dir(), fmt, NULL);
 	startup_info->have_repository = 1;
 	repo_set_hash_algo(the_repository, fmt->hash_algo);
+	repo_set_compat_hash_algo(the_repository, GIT_HASH_UNKNOWN);
 	the_repository->repository_format_worktree_config =
 		fmt->worktree_config;
 	the_repository->repository_format_partial_clone =
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 05/30] loose: add a mapping between SHA-1 and SHA-256 for loose objects
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
                     ` (3 preceding siblings ...)
  2023-10-02  2:40   ` [PATCH v2 04/30] repository: add a compatibility hash algorithm Eric W. Biederman
@ 2023-10-02  2:40   ` Eric W. Biederman
  2024-02-14  7:20     ` Linus Arver
  2024-02-15 11:22     ` Patrick Steinhardt
  2023-10-02  2:40   ` [PATCH v2 06/30] loose: compatibilty short name support Eric W. Biederman
                     ` (26 subsequent siblings)
  31 siblings, 2 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  2:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

From: "brian m. carlson" <sandals@crustytoothpaste.net>

As part of the transition plan, we'd like to add a file in the .git
directory that maps loose objects between SHA-1 and SHA-256.  Let's
implement the specification in the transition plan and store this data
on a per-repository basis in struct repository.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 Makefile              |   1 +
 loose.c               | 246 ++++++++++++++++++++++++++++++++++++++++++
 loose.h               |  22 ++++
 object-file-convert.c |  14 ++-
 object-store-ll.h     |   3 +
 object.c              |   2 +
 repository.c          |   6 ++
 7 files changed, 293 insertions(+), 1 deletion(-)
 create mode 100644 loose.c
 create mode 100644 loose.h

diff --git a/Makefile b/Makefile
index f7e824f25cda..3c18664def9a 100644
--- a/Makefile
+++ b/Makefile
@@ -1053,6 +1053,7 @@ LIB_OBJS += list-objects-filter.o
 LIB_OBJS += list-objects.o
 LIB_OBJS += lockfile.o
 LIB_OBJS += log-tree.o
+LIB_OBJS += loose.o
 LIB_OBJS += ls-refs.o
 LIB_OBJS += mailinfo.o
 LIB_OBJS += mailmap.o
diff --git a/loose.c b/loose.c
new file mode 100644
index 000000000000..6ba73cc84dca
--- /dev/null
+++ b/loose.c
@@ -0,0 +1,246 @@
+#include "git-compat-util.h"
+#include "hash.h"
+#include "path.h"
+#include "object-store.h"
+#include "hex.h"
+#include "wrapper.h"
+#include "gettext.h"
+#include "loose.h"
+#include "lockfile.h"
+
+static const char *loose_object_header = "# loose-object-idx\n";
+
+static inline int should_use_loose_object_map(struct repository *repo)
+{
+	return repo->compat_hash_algo && repo->gitdir;
+}
+
+void loose_object_map_init(struct loose_object_map **map)
+{
+	struct loose_object_map *m;
+	m = xmalloc(sizeof(**map));
+	m->to_compat = kh_init_oid_map();
+	m->to_storage = kh_init_oid_map();
+	*map = m;
+}
+
+static int insert_oid_pair(kh_oid_map_t *map, const struct object_id *key, const struct object_id *value)
+{
+	khiter_t pos;
+	int ret;
+	struct object_id *stored;
+
+	pos = kh_put_oid_map(map, *key, &ret);
+
+	/* This item already exists in the map. */
+	if (ret == 0)
+		return 0;
+
+	stored = xmalloc(sizeof(*stored));
+	oidcpy(stored, value);
+	kh_value(map, pos) = stored;
+	return 1;
+}
+
+static int load_one_loose_object_map(struct repository *repo, struct object_directory *dir)
+{
+	struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT;
+	FILE *fp;
+
+	if (!dir->loose_map)
+		loose_object_map_init(&dir->loose_map);
+
+	insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->empty_tree, repo->compat_hash_algo->empty_tree);
+	insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->empty_tree, repo->hash_algo->empty_tree);
+
+	insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->empty_blob, repo->compat_hash_algo->empty_blob);
+	insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->empty_blob, repo->hash_algo->empty_blob);
+
+	insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->null_oid, repo->compat_hash_algo->null_oid);
+	insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->null_oid, repo->hash_algo->null_oid);
+
+	strbuf_git_common_path(&path, repo, "objects/loose-object-idx");
+	fp = fopen(path.buf, "rb");
+	if (!fp) {
+		strbuf_release(&path);
+		return 0;
+	}
+
+	errno = 0;
+	if (strbuf_getwholeline(&buf, fp, '\n') || strcmp(buf.buf, loose_object_header))
+		goto err;
+	while (!strbuf_getline_lf(&buf, fp)) {
+		const char *p;
+		struct object_id oid, compat_oid;
+		if (parse_oid_hex_algop(buf.buf, &oid, &p, repo->hash_algo) ||
+		    *p++ != ' ' ||
+		    parse_oid_hex_algop(p, &compat_oid, &p, repo->compat_hash_algo) ||
+		    p != buf.buf + buf.len)
+			goto err;
+		insert_oid_pair(dir->loose_map->to_compat, &oid, &compat_oid);
+		insert_oid_pair(dir->loose_map->to_storage, &compat_oid, &oid);
+	}
+
+	strbuf_release(&buf);
+	strbuf_release(&path);
+	return errno ? -1 : 0;
+err:
+	strbuf_release(&buf);
+	strbuf_release(&path);
+	return -1;
+}
+
+int repo_read_loose_object_map(struct repository *repo)
+{
+	struct object_directory *dir;
+
+	if (!should_use_loose_object_map(repo))
+		return 0;
+
+	prepare_alt_odb(repo);
+
+	for (dir = repo->objects->odb; dir; dir = dir->next) {
+		if (load_one_loose_object_map(repo, dir) < 0) {
+			return -1;
+		}
+	}
+	return 0;
+}
+
+int repo_write_loose_object_map(struct repository *repo)
+{
+	kh_oid_map_t *map = repo->objects->odb->loose_map->to_compat;
+	struct lock_file lock;
+	int fd;
+	khiter_t iter;
+	struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT;
+
+	if (!should_use_loose_object_map(repo))
+		return 0;
+
+	strbuf_git_common_path(&path, repo, "objects/loose-object-idx");
+	fd = hold_lock_file_for_update_timeout(&lock, path.buf, LOCK_DIE_ON_ERROR, -1);
+	iter = kh_begin(map);
+	if (write_in_full(fd, loose_object_header, strlen(loose_object_header)) < 0)
+		goto errout;
+
+	for (; iter != kh_end(map); iter++) {
+		if (kh_exist(map, iter)) {
+			if (oideq(&kh_key(map, iter), the_hash_algo->empty_tree) ||
+			    oideq(&kh_key(map, iter), the_hash_algo->empty_blob))
+				continue;
+			strbuf_addf(&buf, "%s %s\n", oid_to_hex(&kh_key(map, iter)), oid_to_hex(kh_value(map, iter)));
+			if (write_in_full(fd, buf.buf, buf.len) < 0)
+				goto errout;
+			strbuf_reset(&buf);
+		}
+	}
+	strbuf_release(&buf);
+	if (commit_lock_file(&lock) < 0) {
+		error_errno(_("could not write loose object index %s"), path.buf);
+		strbuf_release(&path);
+		return -1;
+	}
+	strbuf_release(&path);
+	return 0;
+errout:
+	rollback_lock_file(&lock);
+	strbuf_release(&buf);
+	error_errno(_("failed to write loose object index %s\n"), path.buf);
+	strbuf_release(&path);
+	return -1;
+}
+
+static int write_one_object(struct repository *repo, const struct object_id *oid,
+			    const struct object_id *compat_oid)
+{
+	struct lock_file lock;
+	int fd;
+	struct stat st;
+	struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT;
+
+	strbuf_git_common_path(&path, repo, "objects/loose-object-idx");
+	hold_lock_file_for_update_timeout(&lock, path.buf, LOCK_DIE_ON_ERROR, -1);
+
+	fd = open(path.buf, O_WRONLY | O_CREAT | O_APPEND, 0666);
+	if (fd < 0)
+		goto errout;
+	if (fstat(fd, &st) < 0)
+		goto errout;
+	if (!st.st_size && write_in_full(fd, loose_object_header, strlen(loose_object_header)) < 0)
+		goto errout;
+
+	strbuf_addf(&buf, "%s %s\n", oid_to_hex(oid), oid_to_hex(compat_oid));
+	if (write_in_full(fd, buf.buf, buf.len) < 0)
+		goto errout;
+	if (close(fd))
+		goto errout;
+	adjust_shared_perm(path.buf);
+	rollback_lock_file(&lock);
+	strbuf_release(&buf);
+	strbuf_release(&path);
+	return 0;
+errout:
+	error_errno(_("failed to write loose object index %s\n"), path.buf);
+	close(fd);
+	rollback_lock_file(&lock);
+	strbuf_release(&buf);
+	strbuf_release(&path);
+	return -1;
+}
+
+int repo_add_loose_object_map(struct repository *repo, const struct object_id *oid,
+			      const struct object_id *compat_oid)
+{
+	int inserted = 0;
+
+	if (!should_use_loose_object_map(repo))
+		return 0;
+
+	inserted |= insert_oid_pair(repo->objects->odb->loose_map->to_compat, oid, compat_oid);
+	inserted |= insert_oid_pair(repo->objects->odb->loose_map->to_storage, compat_oid, oid);
+	if (inserted)
+		return write_one_object(repo, oid, compat_oid);
+	return 0;
+}
+
+int repo_loose_object_map_oid(struct repository *repo,
+			      const struct object_id *src,
+			      const struct git_hash_algo *to,
+			      struct object_id *dest)
+{
+	struct object_directory *dir;
+	kh_oid_map_t *map;
+	khiter_t pos;
+
+	for (dir = repo->objects->odb; dir; dir = dir->next) {
+		struct loose_object_map *loose_map = dir->loose_map;
+		if (!loose_map)
+			continue;
+		map = (to == repo->compat_hash_algo) ?
+			loose_map->to_compat :
+			loose_map->to_storage;
+		pos = kh_get_oid_map(map, *src);
+		if (pos < kh_end(map)) {
+			oidcpy(dest, kh_value(map, pos));
+			return 0;
+		}
+	}
+	return -1;
+}
+
+void loose_object_map_clear(struct loose_object_map **map)
+{
+	struct loose_object_map *m = *map;
+	struct object_id *oid;
+
+	if (!m)
+		return;
+
+	kh_foreach_value(m->to_compat, oid, free(oid));
+	kh_foreach_value(m->to_storage, oid, free(oid));
+	kh_destroy_oid_map(m->to_compat);
+	kh_destroy_oid_map(m->to_storage);
+	free(m);
+	*map = NULL;
+}
diff --git a/loose.h b/loose.h
new file mode 100644
index 000000000000..2c2957072c5f
--- /dev/null
+++ b/loose.h
@@ -0,0 +1,22 @@
+#ifndef LOOSE_H
+#define LOOSE_H
+
+#include "khash.h"
+
+struct loose_object_map {
+	kh_oid_map_t *to_compat;
+	kh_oid_map_t *to_storage;
+};
+
+void loose_object_map_init(struct loose_object_map **map);
+void loose_object_map_clear(struct loose_object_map **map);
+int repo_loose_object_map_oid(struct repository *repo,
+			      const struct object_id *src,
+			      const struct git_hash_algo *dest_algo,
+			      struct object_id *dest);
+int repo_add_loose_object_map(struct repository *repo, const struct object_id *oid,
+			      const struct object_id *compat_oid);
+int repo_read_loose_object_map(struct repository *repo);
+int repo_write_loose_object_map(struct repository *repo);
+
+#endif
diff --git a/object-file-convert.c b/object-file-convert.c
index 4777aba83636..1ec945eaa17f 100644
--- a/object-file-convert.c
+++ b/object-file-convert.c
@@ -4,6 +4,7 @@
 #include "repository.h"
 #include "hash-ll.h"
 #include "object.h"
+#include "loose.h"
 #include "object-file-convert.h"
 
 int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
@@ -21,7 +22,18 @@ int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
 			oidcpy(dest, src);
 		return 0;
 	}
-	return -1;
+	if (repo_loose_object_map_oid(repo, src, to, dest)) {
+		/*
+		 * We may have loaded the object map at repo initialization but
+		 * another process (perhaps upstream of a pipe from us) may have
+		 * written a new object into the map.  If the object is missing,
+		 * let's reload the map to see if the object has appeared.
+		 */
+		repo_read_loose_object_map(repo);
+		if (repo_loose_object_map_oid(repo, src, to, dest))
+			return -1;
+	}
+	return 0;
 }
 
 int convert_object_file(struct strbuf *outbuf,
diff --git a/object-store-ll.h b/object-store-ll.h
index 26a3895c821c..bc76d6bec80d 100644
--- a/object-store-ll.h
+++ b/object-store-ll.h
@@ -26,6 +26,9 @@ struct object_directory {
 	uint32_t loose_objects_subdir_seen[8]; /* 256 bits */
 	struct oidtree *loose_objects_cache;
 
+	/* Map between object IDs for loose objects. */
+	struct loose_object_map *loose_map;
+
 	/*
 	 * This is a temporary object store created by the tmp_objdir
 	 * facility. Disable ref updates since the objects in the store
diff --git a/object.c b/object.c
index 2c61e4c86217..186a0a47c0fb 100644
--- a/object.c
+++ b/object.c
@@ -13,6 +13,7 @@
 #include "alloc.h"
 #include "packfile.h"
 #include "commit-graph.h"
+#include "loose.h"
 
 unsigned int get_max_object_index(void)
 {
@@ -540,6 +541,7 @@ void free_object_directory(struct object_directory *odb)
 {
 	free(odb->path);
 	odb_clear_loose_cache(odb);
+	loose_object_map_clear(&odb->loose_map);
 	free(odb);
 }
 
diff --git a/repository.c b/repository.c
index 80252b79e93e..6214f61cf4e7 100644
--- a/repository.c
+++ b/repository.c
@@ -14,6 +14,7 @@
 #include "read-cache-ll.h"
 #include "remote.h"
 #include "setup.h"
+#include "loose.h"
 #include "submodule-config.h"
 #include "sparse-index.h"
 #include "trace2.h"
@@ -109,6 +110,8 @@ void repo_set_compat_hash_algo(struct repository *repo, int algo)
 	if (hash_algo_by_ptr(repo->hash_algo) == algo)
 		BUG("hash_algo and compat_hash_algo match");
 	repo->compat_hash_algo = algo ? &hash_algos[algo] : NULL;
+	if (repo->compat_hash_algo)
+		repo_read_loose_object_map(repo);
 }
 
 /*
@@ -201,6 +204,9 @@ int repo_init(struct repository *repo,
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
+	if (repo->compat_hash_algo)
+		repo_read_loose_object_map(repo);
+
 	clear_repository_format(&format);
 	return 0;
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 06/30] loose: compatibilty short name support
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
                     ` (4 preceding siblings ...)
  2023-10-02  2:40   ` [PATCH v2 05/30] loose: add a mapping between SHA-1 and SHA-256 for loose objects Eric W. Biederman
@ 2023-10-02  2:40   ` Eric W. Biederman
  2024-02-15 11:22     ` Patrick Steinhardt
  2023-10-02  2:40   ` [PATCH v2 07/30] object-file: update the loose object map when writing loose objects Eric W. Biederman
                     ` (25 subsequent siblings)
  31 siblings, 1 reply; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  2:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

Update loose_objects_cache when udpating the loose objects map.  This
oidtree is used to discover which oids are possibilities when
resolving short names, and it can support a mixture of sha1
and sha256 oids.

With this any oid recorded objects/loose-objects-idx is usable
for resolving an oid to an object.

To make this maintainable a helper insert_loose_map is factored
out of load_one_loose_object_map and repo_add_loose_object_map,
and then modified to also update the loose_objects_cache.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 loose.c | 37 +++++++++++++++++++++++++------------
 1 file changed, 25 insertions(+), 12 deletions(-)

diff --git a/loose.c b/loose.c
index 6ba73cc84dca..f6faa6216a08 100644
--- a/loose.c
+++ b/loose.c
@@ -7,6 +7,7 @@
 #include "gettext.h"
 #include "loose.h"
 #include "lockfile.h"
+#include "oidtree.h"
 
 static const char *loose_object_header = "# loose-object-idx\n";
 
@@ -42,6 +43,21 @@ static int insert_oid_pair(kh_oid_map_t *map, const struct object_id *key, const
 	return 1;
 }
 
+static int insert_loose_map(struct object_directory *odb,
+			    const struct object_id *oid,
+			    const struct object_id *compat_oid)
+{
+	struct loose_object_map *map = odb->loose_map;
+	int inserted = 0;
+
+	inserted |= insert_oid_pair(map->to_compat, oid, compat_oid);
+	inserted |= insert_oid_pair(map->to_storage, compat_oid, oid);
+	if (inserted)
+		oidtree_insert(odb->loose_objects_cache, compat_oid);
+
+	return inserted;
+}
+
 static int load_one_loose_object_map(struct repository *repo, struct object_directory *dir)
 {
 	struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT;
@@ -49,15 +65,14 @@ static int load_one_loose_object_map(struct repository *repo, struct object_dire
 
 	if (!dir->loose_map)
 		loose_object_map_init(&dir->loose_map);
+	if (!dir->loose_objects_cache) {
+		ALLOC_ARRAY(dir->loose_objects_cache, 1);
+		oidtree_init(dir->loose_objects_cache);
+	}
 
-	insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->empty_tree, repo->compat_hash_algo->empty_tree);
-	insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->empty_tree, repo->hash_algo->empty_tree);
-
-	insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->empty_blob, repo->compat_hash_algo->empty_blob);
-	insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->empty_blob, repo->hash_algo->empty_blob);
-
-	insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->null_oid, repo->compat_hash_algo->null_oid);
-	insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->null_oid, repo->hash_algo->null_oid);
+	insert_loose_map(dir, repo->hash_algo->empty_tree, repo->compat_hash_algo->empty_tree);
+	insert_loose_map(dir, repo->hash_algo->empty_blob, repo->compat_hash_algo->empty_blob);
+	insert_loose_map(dir, repo->hash_algo->null_oid, repo->compat_hash_algo->null_oid);
 
 	strbuf_git_common_path(&path, repo, "objects/loose-object-idx");
 	fp = fopen(path.buf, "rb");
@@ -77,8 +92,7 @@ static int load_one_loose_object_map(struct repository *repo, struct object_dire
 		    parse_oid_hex_algop(p, &compat_oid, &p, repo->compat_hash_algo) ||
 		    p != buf.buf + buf.len)
 			goto err;
-		insert_oid_pair(dir->loose_map->to_compat, &oid, &compat_oid);
-		insert_oid_pair(dir->loose_map->to_storage, &compat_oid, &oid);
+		insert_loose_map(dir, &oid, &compat_oid);
 	}
 
 	strbuf_release(&buf);
@@ -197,8 +211,7 @@ int repo_add_loose_object_map(struct repository *repo, const struct object_id *o
 	if (!should_use_loose_object_map(repo))
 		return 0;
 
-	inserted |= insert_oid_pair(repo->objects->odb->loose_map->to_compat, oid, compat_oid);
-	inserted |= insert_oid_pair(repo->objects->odb->loose_map->to_storage, compat_oid, oid);
+	inserted = insert_loose_map(repo->objects->odb, oid, compat_oid);
 	if (inserted)
 		return write_one_object(repo, oid, compat_oid);
 	return 0;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 07/30] object-file: update the loose object map when writing loose objects
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
                     ` (5 preceding siblings ...)
  2023-10-02  2:40   ` [PATCH v2 06/30] loose: compatibilty short name support Eric W. Biederman
@ 2023-10-02  2:40   ` Eric W. Biederman
  2024-02-15 11:22     ` Patrick Steinhardt
  2023-10-02  2:40   ` [PATCH v2 08/30] object-file: add a compat_oid_in parameter to write_object_file_flags Eric W. Biederman
                     ` (24 subsequent siblings)
  31 siblings, 1 reply; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  2:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

To implement SHA1 compatibility on SHA256 repositories the loose
object map needs to be updated whenver a loose object is written.
Updating the loose object map this way allows git to support
the old hash algorithm in constant time.

The functions write_loose_object, and stream_loose_object are
the only two functions that write to the loose object store.

Update stream_loose_object to compute the compatibiilty hash, update
the loose object, and then call repo_add_loose_object_map to update
the loose object map.

Update write_object_file_flags to convert the object into
it's compatibility encoding, hash the compatibility encoding,
write the object, and then update the loose object map.

Update force_object_loose to lookup the hash of the compatibility
encoding, write the loose object, and then update the loose object
map.

Update write_object_file_literally to convert the object into it's
compatibility hash encoding, hash the compatibility enconding, write
the object, and then update the loose object map, when the type string
is a known type.  For objects with an unknown type this results in a
partially broken repository, as the objects are not mapped.

The point of write_object_file_literally is to generate a partially
broken repository for testing.  For testing skipping writing the loose
object map is much more useful than refusing to write the broken
object at all.

Except that the loose objects are updated before the loose object map
I have not done any analysis to see how robust this scheme is in the
event of failure.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 object-file.c | 113 ++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 95 insertions(+), 18 deletions(-)

diff --git a/object-file.c b/object-file.c
index 7dc0c4bfbba8..4e55f475b3b4 100644
--- a/object-file.c
+++ b/object-file.c
@@ -43,6 +43,8 @@
 #include "setup.h"
 #include "submodule.h"
 #include "fsck.h"
+#include "loose.h"
+#include "object-file-convert.h"
 
 /* The maximum size for an object header. */
 #define MAX_HEADER_LEN 32
@@ -1952,9 +1954,12 @@ static int start_loose_object_common(struct strbuf *tmp_file,
 				     const char *filename, unsigned flags,
 				     git_zstream *stream,
 				     unsigned char *buf, size_t buflen,
-				     git_hash_ctx *c,
+				     git_hash_ctx *c, git_hash_ctx *compat_c,
 				     char *hdr, int hdrlen)
 {
+	struct repository *repo = the_repository;
+	const struct git_hash_algo *algo = repo->hash_algo;
+	const struct git_hash_algo *compat = repo->compat_hash_algo;
 	int fd;
 
 	fd = create_tmpfile(tmp_file, filename);
@@ -1974,14 +1979,18 @@ static int start_loose_object_common(struct strbuf *tmp_file,
 	git_deflate_init(stream, zlib_compression_level);
 	stream->next_out = buf;
 	stream->avail_out = buflen;
-	the_hash_algo->init_fn(c);
+	algo->init_fn(c);
+	if (compat && compat_c)
+		compat->init_fn(compat_c);
 
 	/*  Start to feed header to zlib stream */
 	stream->next_in = (unsigned char *)hdr;
 	stream->avail_in = hdrlen;
 	while (git_deflate(stream, 0) == Z_OK)
 		; /* nothing */
-	the_hash_algo->update_fn(c, hdr, hdrlen);
+	algo->update_fn(c, hdr, hdrlen);
+	if (compat && compat_c)
+		compat->update_fn(compat_c, hdr, hdrlen);
 
 	return fd;
 }
@@ -1990,16 +1999,21 @@ static int start_loose_object_common(struct strbuf *tmp_file,
  * Common steps for the inner git_deflate() loop for writing loose
  * objects. Returns what git_deflate() returns.
  */
-static int write_loose_object_common(git_hash_ctx *c,
+static int write_loose_object_common(git_hash_ctx *c, git_hash_ctx *compat_c,
 				     git_zstream *stream, const int flush,
 				     unsigned char *in0, const int fd,
 				     unsigned char *compressed,
 				     const size_t compressed_len)
 {
+	struct repository *repo = the_repository;
+	const struct git_hash_algo *algo = repo->hash_algo;
+	const struct git_hash_algo *compat = repo->compat_hash_algo;
 	int ret;
 
 	ret = git_deflate(stream, flush ? Z_FINISH : 0);
-	the_hash_algo->update_fn(c, in0, stream->next_in - in0);
+	algo->update_fn(c, in0, stream->next_in - in0);
+	if (compat && compat_c)
+		compat->update_fn(compat_c, in0, stream->next_in - in0);
 	if (write_in_full(fd, compressed, stream->next_out - compressed) < 0)
 		die_errno(_("unable to write loose object file"));
 	stream->next_out = compressed;
@@ -2014,15 +2028,21 @@ static int write_loose_object_common(git_hash_ctx *c,
  * - End the compression of zlib stream.
  * - Get the calculated oid to "oid".
  */
-static int end_loose_object_common(git_hash_ctx *c, git_zstream *stream,
-				   struct object_id *oid)
+static int end_loose_object_common(git_hash_ctx *c, git_hash_ctx *compat_c,
+				   git_zstream *stream, struct object_id *oid,
+				   struct object_id *compat_oid)
 {
+	struct repository *repo = the_repository;
+	const struct git_hash_algo *algo = repo->hash_algo;
+	const struct git_hash_algo *compat = repo->compat_hash_algo;
 	int ret;
 
 	ret = git_deflate_end_gently(stream);
 	if (ret != Z_OK)
 		return ret;
-	the_hash_algo->final_oid_fn(oid, c);
+	algo->final_oid_fn(oid, c);
+	if (compat && compat_c)
+		compat->final_oid_fn(compat_oid, compat_c);
 
 	return Z_OK;
 }
@@ -2046,7 +2066,7 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
 
 	fd = start_loose_object_common(&tmp_file, filename.buf, flags,
 				       &stream, compressed, sizeof(compressed),
-				       &c, hdr, hdrlen);
+				       &c, NULL, hdr, hdrlen);
 	if (fd < 0)
 		return -1;
 
@@ -2056,14 +2076,14 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
 	do {
 		unsigned char *in0 = stream.next_in;
 
-		ret = write_loose_object_common(&c, &stream, 1, in0, fd,
+		ret = write_loose_object_common(&c, NULL, &stream, 1, in0, fd,
 						compressed, sizeof(compressed));
 	} while (ret == Z_OK);
 
 	if (ret != Z_STREAM_END)
 		die(_("unable to deflate new object %s (%d)"), oid_to_hex(oid),
 		    ret);
-	ret = end_loose_object_common(&c, &stream, &parano_oid);
+	ret = end_loose_object_common(&c, NULL, &stream, &parano_oid, NULL);
 	if (ret != Z_OK)
 		die(_("deflateEnd on object %s failed (%d)"), oid_to_hex(oid),
 		    ret);
@@ -2108,10 +2128,12 @@ static int freshen_packed_object(const struct object_id *oid)
 int stream_loose_object(struct input_stream *in_stream, size_t len,
 			struct object_id *oid)
 {
+	const struct git_hash_algo *compat = the_repository->compat_hash_algo;
+	struct object_id compat_oid;
 	int fd, ret, err = 0, flush = 0;
 	unsigned char compressed[4096];
 	git_zstream stream;
-	git_hash_ctx c;
+	git_hash_ctx c, compat_c;
 	struct strbuf tmp_file = STRBUF_INIT;
 	struct strbuf filename = STRBUF_INIT;
 	int dirlen;
@@ -2135,7 +2157,7 @@ int stream_loose_object(struct input_stream *in_stream, size_t len,
 	 */
 	fd = start_loose_object_common(&tmp_file, filename.buf, 0,
 				       &stream, compressed, sizeof(compressed),
-				       &c, hdr, hdrlen);
+				       &c, &compat_c, hdr, hdrlen);
 	if (fd < 0) {
 		err = -1;
 		goto cleanup;
@@ -2153,7 +2175,7 @@ int stream_loose_object(struct input_stream *in_stream, size_t len,
 			if (in_stream->is_finished)
 				flush = 1;
 		}
-		ret = write_loose_object_common(&c, &stream, flush, in0, fd,
+		ret = write_loose_object_common(&c, &compat_c, &stream, flush, in0, fd,
 						compressed, sizeof(compressed));
 		/*
 		 * Unlike write_loose_object(), we do not have the entire
@@ -2176,7 +2198,7 @@ int stream_loose_object(struct input_stream *in_stream, size_t len,
 	 */
 	if (ret != Z_STREAM_END)
 		die(_("unable to stream deflate new object (%d)"), ret);
-	ret = end_loose_object_common(&c, &stream, oid);
+	ret = end_loose_object_common(&c, &compat_c, &stream, oid, &compat_oid);
 	if (ret != Z_OK)
 		die(_("deflateEnd on stream object failed (%d)"), ret);
 	close_loose_object(fd, tmp_file.buf);
@@ -2203,6 +2225,8 @@ int stream_loose_object(struct input_stream *in_stream, size_t len,
 	}
 
 	err = finalize_object_file(tmp_file.buf, filename.buf);
+	if (!err && compat)
+		err = repo_add_loose_object_map(the_repository, oid, &compat_oid);
 cleanup:
 	strbuf_release(&tmp_file);
 	strbuf_release(&filename);
@@ -2213,17 +2237,38 @@ int write_object_file_flags(const void *buf, unsigned long len,
 			    enum object_type type, struct object_id *oid,
 			    unsigned flags)
 {
+	struct repository *repo = the_repository;
+	const struct git_hash_algo *algo = repo->hash_algo;
+	const struct git_hash_algo *compat = repo->compat_hash_algo;
+	struct object_id compat_oid;
 	char hdr[MAX_HEADER_LEN];
 	int hdrlen = sizeof(hdr);
 
+	/* Generate compat_oid */
+	if (compat) {
+		if (type == OBJ_BLOB)
+			hash_object_file(compat, buf, len, type, &compat_oid);
+		else {
+			struct strbuf converted = STRBUF_INIT;
+			convert_object_file(&converted, algo, compat,
+					    buf, len, type, 0);
+			hash_object_file(compat, converted.buf, converted.len,
+					 type, &compat_oid);
+			strbuf_release(&converted);
+		}
+	}
+
 	/* Normally if we have it in the pack then we do not bother writing
 	 * it out into .git/objects/??/?{38} file.
 	 */
-	write_object_file_prepare(the_hash_algo, buf, len, type, oid, hdr,
-				  &hdrlen);
+	write_object_file_prepare(algo, buf, len, type, oid, hdr, &hdrlen);
 	if (freshen_packed_object(oid) || freshen_loose_object(oid))
 		return 0;
-	return write_loose_object(oid, hdr, hdrlen, buf, len, 0, flags);
+	if (write_loose_object(oid, hdr, hdrlen, buf, len, 0, flags))
+		return -1;
+	if (compat)
+		return repo_add_loose_object_map(repo, oid, &compat_oid);
+	return 0;
 }
 
 int write_object_file_literally(const void *buf, unsigned long len,
@@ -2231,7 +2276,27 @@ int write_object_file_literally(const void *buf, unsigned long len,
 				unsigned flags)
 {
 	char *header;
+	struct repository *repo = the_repository;
+	const struct git_hash_algo *algo = repo->hash_algo;
+	const struct git_hash_algo *compat = repo->compat_hash_algo;
+	struct object_id compat_oid;
 	int hdrlen, status = 0;
+	int compat_type = -1;
+
+	if (compat) {
+		compat_type = type_from_string_gently(type, -1, 1);
+		if (compat_type == OBJ_BLOB)
+			hash_object_file(compat, buf, len, compat_type,
+					 &compat_oid);
+		else if (compat_type != -1) {
+			struct strbuf converted = STRBUF_INIT;
+			convert_object_file(&converted, algo, compat,
+					    buf, len, compat_type, 0);
+			hash_object_file(compat, converted.buf, converted.len,
+					 compat_type, &compat_oid);
+			strbuf_release(&converted);
+		}
+	}
 
 	/* type string, SP, %lu of the length plus NUL must fit this */
 	hdrlen = strlen(type) + MAX_HEADER_LEN;
@@ -2244,6 +2309,8 @@ int write_object_file_literally(const void *buf, unsigned long len,
 	if (freshen_packed_object(oid) || freshen_loose_object(oid))
 		goto cleanup;
 	status = write_loose_object(oid, header, hdrlen, buf, len, 0, 0);
+	if (compat_type != -1)
+		return repo_add_loose_object_map(repo, oid, &compat_oid);
 
 cleanup:
 	free(header);
@@ -2252,9 +2319,12 @@ int write_object_file_literally(const void *buf, unsigned long len,
 
 int force_object_loose(const struct object_id *oid, time_t mtime)
 {
+	struct repository *repo = the_repository;
+	const struct git_hash_algo *compat = repo->compat_hash_algo;
 	void *buf;
 	unsigned long len;
 	struct object_info oi = OBJECT_INFO_INIT;
+	struct object_id compat_oid;
 	enum object_type type;
 	char hdr[MAX_HEADER_LEN];
 	int hdrlen;
@@ -2267,8 +2337,15 @@ int force_object_loose(const struct object_id *oid, time_t mtime)
 	oi.contentp = &buf;
 	if (oid_object_info_extended(the_repository, oid, &oi, 0))
 		return error(_("cannot read object for %s"), oid_to_hex(oid));
+	if (compat) {
+		if (repo_oid_to_algop(repo, oid, compat, &compat_oid))
+			return error(_("cannot map object %s to %s"),
+				     oid_to_hex(oid), compat->name);
+	}
 	hdrlen = format_object_header(hdr, sizeof(hdr), type, len);
 	ret = write_loose_object(oid, hdr, hdrlen, buf, len, mtime, 0);
+	if (!ret && compat)
+		ret = repo_add_loose_object_map(the_repository, oid, &compat_oid);
 	free(buf);
 
 	return ret;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 08/30] object-file: add a compat_oid_in parameter to write_object_file_flags
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
                     ` (6 preceding siblings ...)
  2023-10-02  2:40   ` [PATCH v2 07/30] object-file: update the loose object map when writing loose objects Eric W. Biederman
@ 2023-10-02  2:40   ` Eric W. Biederman
  2023-10-02  2:40   ` [PATCH v2 09/30] commit: write commits for both hashes Eric W. Biederman
                     ` (23 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  2:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

To create the proper signatures for commit objects both versions of
the commit object need to be generated and signed.  After that it is
a waste to throw away the work of generating the compatibility hash
so update write_object_file_flags to take a compatibility hash input
parameter that it can use to skip the work of generating the
compatability hash.

Update the places that don't generate the compatability hash to
pass NULL so it is easy to tell write_object_file_flags should
not attempt to use their compatability hash.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 cache-tree.c      | 2 +-
 object-file.c     | 6 ++++--
 object-store-ll.h | 4 ++--
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/cache-tree.c b/cache-tree.c
index 641427ed410a..ddc7d3d86959 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -448,7 +448,7 @@ static int update_one(struct cache_tree *it,
 		hash_object_file(the_hash_algo, buffer.buf, buffer.len,
 				 OBJ_TREE, &it->oid);
 	} else if (write_object_file_flags(buffer.buf, buffer.len, OBJ_TREE,
-					   &it->oid, flags & WRITE_TREE_SILENT
+					   &it->oid, NULL, flags & WRITE_TREE_SILENT
 					   ? HASH_SILENT : 0)) {
 		strbuf_release(&buffer);
 		return -1;
diff --git a/object-file.c b/object-file.c
index 4e55f475b3b4..820810a5f4b3 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2235,7 +2235,7 @@ int stream_loose_object(struct input_stream *in_stream, size_t len,
 
 int write_object_file_flags(const void *buf, unsigned long len,
 			    enum object_type type, struct object_id *oid,
-			    unsigned flags)
+			    struct object_id *compat_oid_in, unsigned flags)
 {
 	struct repository *repo = the_repository;
 	const struct git_hash_algo *algo = repo->hash_algo;
@@ -2246,7 +2246,9 @@ int write_object_file_flags(const void *buf, unsigned long len,
 
 	/* Generate compat_oid */
 	if (compat) {
-		if (type == OBJ_BLOB)
+		if (compat_oid_in)
+			oidcpy(&compat_oid, compat_oid_in);
+		else if (type == OBJ_BLOB)
 			hash_object_file(compat, buf, len, type, &compat_oid);
 		else {
 			struct strbuf converted = STRBUF_INIT;
diff --git a/object-store-ll.h b/object-store-ll.h
index bc76d6bec80d..c5f2bb2fc2fe 100644
--- a/object-store-ll.h
+++ b/object-store-ll.h
@@ -255,11 +255,11 @@ void hash_object_file(const struct git_hash_algo *algo, const void *buf,
 
 int write_object_file_flags(const void *buf, unsigned long len,
 			    enum object_type type, struct object_id *oid,
-			    unsigned flags);
+			    struct object_id *comapt_oid_in, unsigned flags);
 static inline int write_object_file(const void *buf, unsigned long len,
 				    enum object_type type, struct object_id *oid)
 {
-	return write_object_file_flags(buf, len, type, oid, 0);
+	return write_object_file_flags(buf, len, type, oid, NULL, 0);
 }
 
 int write_object_file_literally(const void *buf, unsigned long len,
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 09/30] commit: write commits for both hashes
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
                     ` (7 preceding siblings ...)
  2023-10-02  2:40   ` [PATCH v2 08/30] object-file: add a compat_oid_in parameter to write_object_file_flags Eric W. Biederman
@ 2023-10-02  2:40   ` Eric W. Biederman
  2023-10-02  2:40   ` [PATCH v2 10/30] commit: convert mergetag before computing the signature of a commit Eric W. Biederman
                     ` (22 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  2:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

From: "brian m. carlson" <sandals@crustytoothpaste.net>

When we write a commit, we include data that is specific to the hash
algorithm, such as parents and the root tree.  In order to write both a
SHA-1 commit and a SHA-256 version, we need to convert between them.

However, a straightforward conversion isn't necessarily what we want.
When we sign a commit, we sign its data, so if we create a commit for
SHA-256 and then write a SHA-1 version, we'll still have only signed the
SHA-256 data.  While this is valid, it would be better to sign both
forms of data so people using SHA-1 can verify the signatures as well.

Consequently, we don't want to use the standard mapping that occurs when
we write an object.  Instead, let's move most of the writing of the
commit into a separate function which is agnostic of the hash algorithm
and which simply writes into a buffer and specify both versions of the
object ourselves.

We can then call this function twice: once with the SHA-256 contents,
and if SHA-1 is enabled, once with the SHA-1 contents.  If we're signing
the commit, we then sign both versions and append both signatures to
both buffers.  To produce a consistent hash, we always append the
signatures in the order in which Git implemented them: first SHA-1, then
SHA-256.

In order to make this signing code work, we split the commit signing
code into two functions, one which signs the buffer, and one which
appends the signature.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 commit.c | 181 +++++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 136 insertions(+), 45 deletions(-)

diff --git a/commit.c b/commit.c
index b3223478bc2a..6765f3a82b9d 100644
--- a/commit.c
+++ b/commit.c
@@ -28,6 +28,7 @@
 #include "shallow.h"
 #include "tree.h"
 #include "hook.h"
+#include "object-file-convert.h"
 
 static struct commit_extra_header *read_commit_extra_header_lines(const char *buf, size_t len, const char **);
 
@@ -1100,12 +1101,11 @@ static const char *gpg_sig_headers[] = {
 	"gpgsig-sha256",
 };
 
-int sign_with_header(struct strbuf *buf, const char *keyid)
+static int add_commit_signature(struct strbuf *buf, struct strbuf *sig, const struct git_hash_algo *algo)
 {
-	struct strbuf sig = STRBUF_INIT;
 	int inspos, copypos;
 	const char *eoh;
-	const char *gpg_sig_header = gpg_sig_headers[hash_algo_by_ptr(the_hash_algo)];
+	const char *gpg_sig_header = gpg_sig_headers[hash_algo_by_ptr(algo)];
 	int gpg_sig_header_len = strlen(gpg_sig_header);
 
 	/* find the end of the header */
@@ -1115,15 +1115,8 @@ int sign_with_header(struct strbuf *buf, const char *keyid)
 	else
 		inspos = eoh - buf->buf + 1;
 
-	if (!keyid || !*keyid)
-		keyid = get_signing_key();
-	if (sign_buffer(buf, &sig, keyid)) {
-		strbuf_release(&sig);
-		return -1;
-	}
-
-	for (copypos = 0; sig.buf[copypos]; ) {
-		const char *bol = sig.buf + copypos;
+	for (copypos = 0; sig->buf[copypos]; ) {
+		const char *bol = sig->buf + copypos;
 		const char *eol = strchrnul(bol, '\n');
 		int len = (eol - bol) + !!*eol;
 
@@ -1136,11 +1129,17 @@ int sign_with_header(struct strbuf *buf, const char *keyid)
 		inspos += len;
 		copypos += len;
 	}
-	strbuf_release(&sig);
 	return 0;
 }
 
-
+static int sign_commit_to_strbuf(struct strbuf *sig, struct strbuf *buf, const char *keyid)
+{
+	if (!keyid || !*keyid)
+		keyid = get_signing_key();
+	if (sign_buffer(buf, sig, keyid))
+		return -1;
+	return 0;
+}
 
 int parse_signed_commit(const struct commit *commit,
 			struct strbuf *payload, struct strbuf *signature,
@@ -1599,70 +1598,162 @@ N_("Warning: commit message did not conform to UTF-8.\n"
    "You may want to amend it after fixing the message, or set the config\n"
    "variable i18n.commitEncoding to the encoding your project uses.\n");
 
-int commit_tree_extended(const char *msg, size_t msg_len,
-			 const struct object_id *tree,
-			 struct commit_list *parents, struct object_id *ret,
-			 const char *author, const char *committer,
-			 const char *sign_commit,
-			 struct commit_extra_header *extra)
+static void write_commit_tree(struct strbuf *buffer, const char *msg, size_t msg_len,
+			      const struct object_id *tree,
+			      const struct object_id *parents, size_t parents_len,
+			      const char *author, const char *committer,
+			      struct commit_extra_header *extra)
 {
-	int result;
 	int encoding_is_utf8;
-	struct strbuf buffer;
-
-	assert_oid_type(tree, OBJ_TREE);
-
-	if (memchr(msg, '\0', msg_len))
-		return error("a NUL byte in commit log message not allowed.");
+	size_t i;
 
 	/* Not having i18n.commitencoding is the same as having utf-8 */
 	encoding_is_utf8 = is_encoding_utf8(git_commit_encoding);
 
-	strbuf_init(&buffer, 8192); /* should avoid reallocs for the headers */
-	strbuf_addf(&buffer, "tree %s\n", oid_to_hex(tree));
+	strbuf_grow(buffer, 8192); /* should avoid reallocs for the headers */
+	strbuf_addf(buffer, "tree %s\n", oid_to_hex(tree));
 
 	/*
 	 * NOTE! This ordering means that the same exact tree merged with a
 	 * different order of parents will be a _different_ changeset even
 	 * if everything else stays the same.
 	 */
-	while (parents) {
-		struct commit *parent = pop_commit(&parents);
-		strbuf_addf(&buffer, "parent %s\n",
-			    oid_to_hex(&parent->object.oid));
-	}
+	for (i = 0; i < parents_len; i++)
+		strbuf_addf(buffer, "parent %s\n", oid_to_hex(&parents[i]));
 
 	/* Person/date information */
 	if (!author)
 		author = git_author_info(IDENT_STRICT);
-	strbuf_addf(&buffer, "author %s\n", author);
+	strbuf_addf(buffer, "author %s\n", author);
 	if (!committer)
 		committer = git_committer_info(IDENT_STRICT);
-	strbuf_addf(&buffer, "committer %s\n", committer);
+	strbuf_addf(buffer, "committer %s\n", committer);
 	if (!encoding_is_utf8)
-		strbuf_addf(&buffer, "encoding %s\n", git_commit_encoding);
+		strbuf_addf(buffer, "encoding %s\n", git_commit_encoding);
 
 	while (extra) {
-		add_extra_header(&buffer, extra);
+		add_extra_header(buffer, extra);
 		extra = extra->next;
 	}
-	strbuf_addch(&buffer, '\n');
+	strbuf_addch(buffer, '\n');
 
 	/* And add the comment */
-	strbuf_add(&buffer, msg, msg_len);
+	strbuf_add(buffer, msg, msg_len);
+}
 
-	/* And check the encoding */
-	if (encoding_is_utf8 && !verify_utf8(&buffer))
-		fprintf(stderr, _(commit_utf8_warn));
+int commit_tree_extended(const char *msg, size_t msg_len,
+			 const struct object_id *tree,
+			 struct commit_list *parents, struct object_id *ret,
+			 const char *author, const char *committer,
+			 const char *sign_commit,
+			 struct commit_extra_header *extra)
+{
+	struct repository *r = the_repository;
+	int result = 0;
+	int encoding_is_utf8;
+	struct strbuf buffer = STRBUF_INIT, compat_buffer = STRBUF_INIT;
+	struct strbuf sig = STRBUF_INIT, compat_sig = STRBUF_INIT;
+	struct object_id *parent_buf = NULL, *compat_oid = NULL;
+	struct object_id compat_oid_buf;
+	size_t i, nparents;
+
+	/* Not having i18n.commitencoding is the same as having utf-8 */
+	encoding_is_utf8 = is_encoding_utf8(git_commit_encoding);
+
+	assert_oid_type(tree, OBJ_TREE);
+
+	if (memchr(msg, '\0', msg_len))
+		return error("a NUL byte in commit log message not allowed.");
 
-	if (sign_commit && sign_with_header(&buffer, sign_commit)) {
+	nparents = commit_list_count(parents);
+	CALLOC_ARRAY(parent_buf, nparents);
+	i = 0;
+	while (parents) {
+		struct commit *parent = pop_commit(&parents);
+		oidcpy(&parent_buf[i++], &parent->object.oid);
+	}
+
+	write_commit_tree(&buffer, msg, msg_len, tree, parent_buf, nparents, author, committer, extra);
+	if (sign_commit && sign_commit_to_strbuf(&sig, &buffer, sign_commit)) {
 		result = -1;
 		goto out;
 	}
+	if (r->compat_hash_algo) {
+		struct object_id mapped_tree;
+		struct object_id *mapped_parents;
+
+		CALLOC_ARRAY(mapped_parents, nparents);
+
+		if (repo_oid_to_algop(r, tree, r->compat_hash_algo, &mapped_tree)) {
+			result = -1;
+			free(mapped_parents);
+			goto out;
+		}
+		for (i = 0; i < nparents; i++)
+			if (repo_oid_to_algop(r, &parent_buf[i], r->compat_hash_algo, &mapped_parents[i])) {
+				result = -1;
+				free(mapped_parents);
+				goto out;
+			}
+		write_commit_tree(&compat_buffer, msg, msg_len, &mapped_tree,
+				  mapped_parents, nparents, author, committer, extra);
+		free(mapped_parents);
+
+		if (sign_commit && sign_commit_to_strbuf(&compat_sig, &compat_buffer, sign_commit)) {
+			result = -1;
+			goto out;
+		}
+	}
+
+	if (sign_commit) {
+		struct sig_pairs {
+			struct strbuf *sig;
+			const struct git_hash_algo *algo;
+		} bufs [2] = {
+			{ &compat_sig, r->compat_hash_algo },
+			{ &sig, r->hash_algo },
+		};
+		int i;
+
+		/*
+		 * We write algorithms in the order they were implemented in
+		 * Git to produce a stable hash when multiple algorithms are
+		 * used.
+		 */
+		if (r->compat_hash_algo && hash_algo_by_ptr(bufs[0].algo) > hash_algo_by_ptr(bufs[1].algo))
+			SWAP(bufs[0], bufs[1]);
+
+		/*
+		 * We traverse each algorithm in order, and apply the signature
+		 * to each buffer.
+		 */
+		for (i = 0; i < ARRAY_SIZE(bufs); i++) {
+			if (!bufs[i].algo)
+				continue;
+			add_commit_signature(&buffer, bufs[i].sig, bufs[i].algo);
+			if (r->compat_hash_algo)
+				add_commit_signature(&compat_buffer, bufs[i].sig, bufs[i].algo);
+		}
+	}
 
-	result = write_object_file(buffer.buf, buffer.len, OBJ_COMMIT, ret);
+	/* And check the encoding. */
+	if (encoding_is_utf8 && (!verify_utf8(&buffer) || !verify_utf8(&compat_buffer)))
+		fprintf(stderr, _(commit_utf8_warn));
+
+	if (r->compat_hash_algo) {
+		hash_object_file(r->compat_hash_algo, compat_buffer.buf, compat_buffer.len,
+			OBJ_COMMIT, &compat_oid_buf);
+		compat_oid = &compat_oid_buf;
+	}
+
+	result = write_object_file_flags(buffer.buf, buffer.len, OBJ_COMMIT,
+					 ret, compat_oid, 0);
 out:
+	free(parent_buf);
 	strbuf_release(&buffer);
+	strbuf_release(&compat_buffer);
+	strbuf_release(&sig);
+	strbuf_release(&compat_sig);
 	return result;
 }
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 10/30] commit: convert mergetag before computing the signature of a commit
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
                     ` (8 preceding siblings ...)
  2023-10-02  2:40   ` [PATCH v2 09/30] commit: write commits for both hashes Eric W. Biederman
@ 2023-10-02  2:40   ` Eric W. Biederman
  2023-10-02  2:40   ` [PATCH v2 11/30] commit: export add_header_signature to support handling signatures on tags Eric W. Biederman
                     ` (21 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  2:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

It so happens that commit mergetag lines embed a tag object.  So to
compute the compatible signature of a commit object that has mergetag
lines the compatible embedded tag must be computed first.

Implement this by duplicating and converting the commit extra headers
into the compatible version of the commit extra headers, that need
to be passed to commit_tree_extended.

To handle merge tags only the compatible extra headers need to be
computed.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 commit.c | 42 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/commit.c b/commit.c
index 6765f3a82b9d..913e015966b4 100644
--- a/commit.c
+++ b/commit.c
@@ -1355,6 +1355,39 @@ void append_merge_tag_headers(struct commit_list *parents,
 	}
 }
 
+static int convert_commit_extra_headers(struct commit_extra_header *orig,
+					struct commit_extra_header **result)
+{
+	const struct git_hash_algo *compat = the_repository->compat_hash_algo;
+	const struct git_hash_algo *algo = the_repository->hash_algo;
+	struct commit_extra_header *extra = NULL, **tail = &extra;
+	struct strbuf out = STRBUF_INIT;
+	while (orig) {
+		struct commit_extra_header *new;
+		CALLOC_ARRAY(new, 1);
+		if (!strcmp(orig->key, "mergetag")) {
+			if (convert_object_file(&out, algo, compat,
+						orig->value, orig->len,
+						OBJ_TAG, 1)) {
+				free(new);
+				free_commit_extra_headers(extra);
+				return -1;
+			}
+			new->key = xstrdup("mergetag");
+			new->value = strbuf_detach(&out, &new->len);
+		} else {
+			new->key = xstrdup(orig->key);
+			new->len = orig->len;
+			new->value = xmemdupz(orig->value, orig->len);
+		}
+		*tail = new;
+		tail = &new->next;
+		orig = orig->next;
+	}
+	*result = extra;
+	return 0;
+}
+
 static void add_extra_header(struct strbuf *buffer,
 			     struct commit_extra_header *extra)
 {
@@ -1679,6 +1712,7 @@ int commit_tree_extended(const char *msg, size_t msg_len,
 		goto out;
 	}
 	if (r->compat_hash_algo) {
+		struct commit_extra_header *compat_extra = NULL;
 		struct object_id mapped_tree;
 		struct object_id *mapped_parents;
 
@@ -1695,8 +1729,14 @@ int commit_tree_extended(const char *msg, size_t msg_len,
 				free(mapped_parents);
 				goto out;
 			}
+		if (convert_commit_extra_headers(extra, &compat_extra)) {
+			result = -1;
+			free(mapped_parents);
+			goto out;
+		}
 		write_commit_tree(&compat_buffer, msg, msg_len, &mapped_tree,
-				  mapped_parents, nparents, author, committer, extra);
+				  mapped_parents, nparents, author, committer, compat_extra);
+		free_commit_extra_headers(compat_extra);
 		free(mapped_parents);
 
 		if (sign_commit && sign_commit_to_strbuf(&compat_sig, &compat_buffer, sign_commit)) {
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 11/30] commit: export add_header_signature to support handling signatures on tags
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
                     ` (9 preceding siblings ...)
  2023-10-02  2:40   ` [PATCH v2 10/30] commit: convert mergetag before computing the signature of a commit Eric W. Biederman
@ 2023-10-02  2:40   ` Eric W. Biederman
  2023-10-02  2:40   ` [PATCH v2 12/30] tag: sign both hashes Eric W. Biederman
                     ` (20 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  2:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

Rename add_commit_signature as add_header_signature, and expose it so
that it can be used for converting tags from one object format to
another.

Inspired-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 commit.c | 6 +++---
 commit.h | 1 +
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/commit.c b/commit.c
index 913e015966b4..2b61a4d0aa11 100644
--- a/commit.c
+++ b/commit.c
@@ -1101,7 +1101,7 @@ static const char *gpg_sig_headers[] = {
 	"gpgsig-sha256",
 };
 
-static int add_commit_signature(struct strbuf *buf, struct strbuf *sig, const struct git_hash_algo *algo)
+int add_header_signature(struct strbuf *buf, struct strbuf *sig, const struct git_hash_algo *algo)
 {
 	int inspos, copypos;
 	const char *eoh;
@@ -1770,9 +1770,9 @@ int commit_tree_extended(const char *msg, size_t msg_len,
 		for (i = 0; i < ARRAY_SIZE(bufs); i++) {
 			if (!bufs[i].algo)
 				continue;
-			add_commit_signature(&buffer, bufs[i].sig, bufs[i].algo);
+			add_header_signature(&buffer, bufs[i].sig, bufs[i].algo);
 			if (r->compat_hash_algo)
-				add_commit_signature(&compat_buffer, bufs[i].sig, bufs[i].algo);
+				add_header_signature(&compat_buffer, bufs[i].sig, bufs[i].algo);
 		}
 	}
 
diff --git a/commit.h b/commit.h
index 28928833c544..03edcec0129f 100644
--- a/commit.h
+++ b/commit.h
@@ -370,5 +370,6 @@ int parse_buffer_signed_by_header(const char *buffer,
 				  struct strbuf *payload,
 				  struct strbuf *signature,
 				  const struct git_hash_algo *algop);
+int add_header_signature(struct strbuf *buf, struct strbuf *sig, const struct git_hash_algo *algo);
 
 #endif /* COMMIT_H */
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 12/30] tag: sign both hashes
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
                     ` (10 preceding siblings ...)
  2023-10-02  2:40   ` [PATCH v2 11/30] commit: export add_header_signature to support handling signatures on tags Eric W. Biederman
@ 2023-10-02  2:40   ` Eric W. Biederman
  2023-10-02  2:40   ` [PATCH v2 13/30] cache: add a function to read an OID of a specific algorithm Eric W. Biederman
                     ` (19 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  2:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

When we write a tag the object oid is specific to the hash algorithm.

This matters when a tag is signed.  The hash transition plan calls for
signatures on both the sha1 form and the sha256 form of the object,
and for both of those signatures to live in the tag object.

To generate tag object with multiple signatures, first compute the
unsigned form of the tag, and then if the tag is being signed compute
the unsigned form of the tag with the compatibilityr hash.  Then
compute compute the signatures of both buffers.

Once the signatures are computed add them to both buffers.  This
allows computing the compatibility hash in do_sign, saving
write_object_file the expense of recomputing the compatibility tag
just to compute it's hash.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 builtin/tag.c | 45 +++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 41 insertions(+), 4 deletions(-)

diff --git a/builtin/tag.c b/builtin/tag.c
index 3918eacbb57b..8c4bc28952c2 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -28,6 +28,7 @@
 #include "ref-filter.h"
 #include "date.h"
 #include "write-or-die.h"
+#include "object-file-convert.h"
 
 static const char * const git_tag_usage[] = {
 	N_("git tag [-a | -s | -u <key-id>] [-f] [-m <msg> | -F <file>] [-e]\n"
@@ -174,9 +175,43 @@ static int verify_tag(const char *name, const char *ref UNUSED,
 	return 0;
 }
 
-static int do_sign(struct strbuf *buffer)
+static int do_sign(struct strbuf *buffer, struct object_id **compat_oid,
+		   struct object_id *compat_oid_buf)
 {
-	return sign_buffer(buffer, buffer, get_signing_key());
+	const struct git_hash_algo *compat = the_repository->compat_hash_algo;
+	struct strbuf sig = STRBUF_INIT, compat_sig = STRBUF_INIT;
+	struct strbuf compat_buf = STRBUF_INIT;
+	const char *keyid = get_signing_key();
+	int ret = -1;
+
+	if (sign_buffer(buffer, &sig, keyid))
+		return -1;
+
+	if (compat) {
+		const struct git_hash_algo *algo = the_repository->hash_algo;
+
+		if (convert_object_file(&compat_buf, algo, compat,
+					buffer->buf, buffer->len, OBJ_TAG, 1))
+			goto out;
+		if (sign_buffer(&compat_buf, &compat_sig, keyid))
+			goto out;
+		add_header_signature(&compat_buf, &sig, algo);
+		strbuf_addbuf(&compat_buf, &compat_sig);
+		hash_object_file(compat, compat_buf.buf, compat_buf.len,
+				 OBJ_TAG, compat_oid_buf);
+		*compat_oid = compat_oid_buf;
+	}
+
+	if (compat_sig.len)
+		add_header_signature(buffer, &compat_sig, compat);
+
+	strbuf_addbuf(buffer, &sig);
+	ret = 0;
+out:
+	strbuf_release(&sig);
+	strbuf_release(&compat_sig);
+	strbuf_release(&compat_buf);
+	return ret;
 }
 
 static const char tag_template[] =
@@ -249,9 +284,11 @@ static void write_tag_body(int fd, const struct object_id *oid)
 
 static int build_tag_object(struct strbuf *buf, int sign, struct object_id *result)
 {
-	if (sign && do_sign(buf) < 0)
+	struct object_id *compat_oid = NULL, compat_oid_buf;
+	if (sign && do_sign(buf, &compat_oid, &compat_oid_buf) < 0)
 		return error(_("unable to sign the tag"));
-	if (write_object_file(buf->buf, buf->len, OBJ_TAG, result) < 0)
+	if (write_object_file_flags(buf->buf, buf->len, OBJ_TAG, result,
+				    compat_oid, 0) < 0)
 		return error(_("unable to write tag file"));
 	return 0;
 }
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 13/30] cache: add a function to read an OID of a specific algorithm
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
                     ` (11 preceding siblings ...)
  2023-10-02  2:40   ` [PATCH v2 12/30] tag: sign both hashes Eric W. Biederman
@ 2023-10-02  2:40   ` Eric W. Biederman
  2023-10-02  2:40   ` [PATCH v2 14/30] object: factor out parse_mode out of fast-import and tree-walk into in object.h Eric W. Biederman
                     ` (18 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  2:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

From: "brian m. carlson" <sandals@crustytoothpaste.net>

Currently, we always read a object ID of the current algorithm with
oidread.  However, once we start converting objects, we'll need to
consider what happens when we want to read an object ID of a specific
algorithm, such as the compatibility algorithm.  To make this easier,
let's define oidread_algop, which specifies which algorithm we should
use for our object ID, and define oidread in terms of it.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 hash.h | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/hash.h b/hash.h
index 615ae0691d07..e064807c1733 100644
--- a/hash.h
+++ b/hash.h
@@ -73,10 +73,15 @@ static inline void oidclr(struct object_id *oid)
 	oid->algo = hash_algo_by_ptr(the_hash_algo);
 }
 
+static inline void oidread_algop(struct object_id *oid, const unsigned char *hash, const struct git_hash_algo *algop)
+{
+	memcpy(oid->hash, hash, algop->rawsz);
+	oid->algo = hash_algo_by_ptr(algop);
+}
+
 static inline void oidread(struct object_id *oid, const unsigned char *hash)
 {
-	memcpy(oid->hash, hash, the_hash_algo->rawsz);
-	oid->algo = hash_algo_by_ptr(the_hash_algo);
+	oidread_algop(oid, hash, the_hash_algo);
 }
 
 static inline int is_empty_blob_sha1(const unsigned char *sha1)
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 14/30] object: factor out parse_mode out of fast-import and tree-walk into in object.h
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
                     ` (12 preceding siblings ...)
  2023-10-02  2:40   ` [PATCH v2 13/30] cache: add a function to read an OID of a specific algorithm Eric W. Biederman
@ 2023-10-02  2:40   ` Eric W. Biederman
  2023-10-02  2:40   ` [PATCH v2 15/30] object-file-convert: add a function to convert trees between algorithms Eric W. Biederman
                     ` (17 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  2:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

builtin/fast-import.c and tree-walk.c have almost identical version of
get_mode.  The two functions started out the same but have diverged
slightly.  The version in fast-import changed mode to a uint16_t to
save memory.  The version in tree-walk started erroring if no mode was
present.

As far as I can tell both of these changes are valid for both of the
callers, so add the both changes and place the common parsing helper
in object.h

Rename the helper from get_mode to parse_mode so it does not
conflict with another helper named get_mode in diff-no-index.c

This will be used shortly in a new helper decode_tree_entry_raw
which is used to compute cmpatibility objects as part of
the sha256 transition.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 builtin/fast-import.c | 18 ++----------------
 object.h              | 18 ++++++++++++++++++
 tree-walk.c           | 22 +++-------------------
 3 files changed, 23 insertions(+), 35 deletions(-)

diff --git a/builtin/fast-import.c b/builtin/fast-import.c
index 4dbb10aff3da..2c645fcfbe3f 100644
--- a/builtin/fast-import.c
+++ b/builtin/fast-import.c
@@ -1235,20 +1235,6 @@ static void *gfi_unpack_entry(
 	return unpack_entry(the_repository, p, oe->idx.offset, &type, sizep);
 }
 
-static const char *get_mode(const char *str, uint16_t *modep)
-{
-	unsigned char c;
-	uint16_t mode = 0;
-
-	while ((c = *str++) != ' ') {
-		if (c < '0' || c > '7')
-			return NULL;
-		mode = (mode << 3) + (c - '0');
-	}
-	*modep = mode;
-	return str;
-}
-
 static void load_tree(struct tree_entry *root)
 {
 	struct object_id *oid = &root->versions[1].oid;
@@ -1286,7 +1272,7 @@ static void load_tree(struct tree_entry *root)
 		t->entries[t->entry_count++] = e;
 
 		e->tree = NULL;
-		c = get_mode(c, &e->versions[1].mode);
+		c = parse_mode(c, &e->versions[1].mode);
 		if (!c)
 			die("Corrupt mode in %s", oid_to_hex(oid));
 		e->versions[0].mode = e->versions[1].mode;
@@ -2275,7 +2261,7 @@ static void file_change_m(const char *p, struct branch *b)
 	struct object_id oid;
 	uint16_t mode, inline_data = 0;
 
-	p = get_mode(p, &mode);
+	p = parse_mode(p, &mode);
 	if (!p)
 		die("Corrupt mode: %s", command_buf.buf);
 	switch (mode) {
diff --git a/object.h b/object.h
index 114d45954d08..70c8d4ae63dc 100644
--- a/object.h
+++ b/object.h
@@ -190,6 +190,24 @@ void *create_object(struct repository *r, const struct object_id *oid, void *obj
 
 void *object_as_type(struct object *obj, enum object_type type, int quiet);
 
+
+static inline const char *parse_mode(const char *str, uint16_t *modep)
+{
+	unsigned char c;
+	unsigned int mode = 0;
+
+	if (*str == ' ')
+		return NULL;
+
+	while ((c = *str++) != ' ') {
+		if (c < '0' || c > '7')
+			return NULL;
+		mode = (mode << 3) + (c - '0');
+	}
+	*modep = mode;
+	return str;
+}
+
 /*
  * Returns the object, having parsed it to find out what it is.
  *
diff --git a/tree-walk.c b/tree-walk.c
index 29ead71be173..3af50a01c2c7 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -10,27 +10,11 @@
 #include "pathspec.h"
 #include "json-writer.h"
 
-static const char *get_mode(const char *str, unsigned int *modep)
-{
-	unsigned char c;
-	unsigned int mode = 0;
-
-	if (*str == ' ')
-		return NULL;
-
-	while ((c = *str++) != ' ') {
-		if (c < '0' || c > '7')
-			return NULL;
-		mode = (mode << 3) + (c - '0');
-	}
-	*modep = mode;
-	return str;
-}
-
 static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned long size, struct strbuf *err)
 {
 	const char *path;
-	unsigned int mode, len;
+	unsigned int len;
+	uint16_t mode;
 	const unsigned hashsz = the_hash_algo->rawsz;
 
 	if (size < hashsz + 3 || buf[size - (hashsz + 1)]) {
@@ -38,7 +22,7 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l
 		return -1;
 	}
 
-	path = get_mode(buf, &mode);
+	path = parse_mode(buf, &mode);
 	if (!path) {
 		strbuf_addstr(err, _("malformed mode in tree entry"));
 		return -1;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 15/30] object-file-convert: add a function to convert trees between algorithms
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
                     ` (13 preceding siblings ...)
  2023-10-02  2:40   ` [PATCH v2 14/30] object: factor out parse_mode out of fast-import and tree-walk into in object.h Eric W. Biederman
@ 2023-10-02  2:40   ` Eric W. Biederman
  2023-10-02  2:40   ` [PATCH v2 16/30] object-file-convert: convert tag objects when writing Eric W. Biederman
                     ` (16 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  2:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

From: "brian m. carlson" <sandals@crustytoothpaste.net>

In the future, we're going to want to provide SHA-256 repositories that
have compatibility support for SHA-1 as well.  In order to do so, we'll
need to be able to convert tree objects from SHA-256 to SHA-1 by writing
a tree with each SHA-256 object ID mapped to a SHA-1 object ID.

We implement a function, convert_tree_object, that takes an existing
tree buffer and writes it to a new strbuf, converting between
algorithms.  Let's make this function generic, because while we only
need it to convert from the main algorithm to the compatibility
algorithm now, we may need to do the other way around in the future,
such as for transport.

We avoid reusing the code in decode_tree_entry because that code
normalizes data, and we don't want that here.  We want to produce a
complete round trip of data, so if, for example, the old entry had a
wrongly zero-padded mode, we'd want to preserve that when converting to
ensure a stable hash value.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 object-file-convert.c | 51 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 50 insertions(+), 1 deletion(-)

diff --git a/object-file-convert.c b/object-file-convert.c
index 1ec945eaa17f..70b80fb61e54 100644
--- a/object-file-convert.c
+++ b/object-file-convert.c
@@ -1,8 +1,10 @@
 #include "git-compat-util.h"
 #include "gettext.h"
 #include "strbuf.h"
+#include "hex.h"
 #include "repository.h"
 #include "hash-ll.h"
+#include "hash.h"
 #include "object.h"
 #include "loose.h"
 #include "object-file-convert.h"
@@ -36,6 +38,51 @@ int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
 	return 0;
 }
 
+static int decode_tree_entry_raw(struct object_id *oid, const char **path,
+				 size_t *len, const struct git_hash_algo *algo,
+				 const char *buf, unsigned long size)
+{
+	uint16_t mode;
+	const unsigned hashsz = algo->rawsz;
+
+	if (size < hashsz + 3 || buf[size - (hashsz + 1)]) {
+		return -1;
+	}
+
+	*path = parse_mode(buf, &mode);
+	if (!*path || !**path)
+		return -1;
+	*len = strlen(*path) + 1;
+
+	oidread_algop(oid, (const unsigned char *)*path + *len, algo);
+	return 0;
+}
+
+static int convert_tree_object(struct strbuf *out,
+			       const struct git_hash_algo *from,
+			       const struct git_hash_algo *to,
+			       const char *buffer, size_t size)
+{
+	const char *p = buffer, *end = buffer + size;
+
+	while (p < end) {
+		struct object_id entry_oid, mapped_oid;
+		const char *path = NULL;
+		size_t pathlen;
+
+		if (decode_tree_entry_raw(&entry_oid, &path, &pathlen, from, p,
+					  end - p))
+			return error(_("failed to decode tree entry"));
+		if (repo_oid_to_algop(the_repository, &entry_oid, to, &mapped_oid))
+			return error(_("failed to map tree entry for %s"), oid_to_hex(&entry_oid));
+		strbuf_add(out, p, path - p);
+		strbuf_add(out, path, pathlen);
+		strbuf_add(out, mapped_oid.hash, to->rawsz);
+		p = path + pathlen + from->rawsz;
+	}
+	return 0;
+}
+
 int convert_object_file(struct strbuf *outbuf,
 			const struct git_hash_algo *from,
 			const struct git_hash_algo *to,
@@ -50,8 +97,10 @@ int convert_object_file(struct strbuf *outbuf,
 		BUG("Refusing noop object file conversion");
 
 	switch (type) {
-	case OBJ_COMMIT:
 	case OBJ_TREE:
+		ret = convert_tree_object(outbuf, from, to, buf, len);
+		break;
+	case OBJ_COMMIT:
 	case OBJ_TAG:
 	default:
 		/* Not implemented yet, so fail. */
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 16/30] object-file-convert: convert tag objects when writing
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
                     ` (14 preceding siblings ...)
  2023-10-02  2:40   ` [PATCH v2 15/30] object-file-convert: add a function to convert trees between algorithms Eric W. Biederman
@ 2023-10-02  2:40   ` Eric W. Biederman
  2023-10-02  2:40   ` [PATCH v2 17/30] object-file-convert: don't leak when converting tag objects Eric W. Biederman
                     ` (15 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  2:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

From: "brian m. carlson" <sandals@crustytoothpaste.net>

When writing a tag object in a repository with both SHA-1 and SHA-256,
we'll need to convert our commit objects so that we can write the hash
values for both into the repository.  To do so, let's add a function to
convert tag objects.

Note that signatures for tag objects in the current algorithm trail the
message, and those for the alternate algorithm are in headers.
Therefore, we parse the tag object for both a trailing signature and a
header and then, when writing the other format, swap the two around.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 object-file-convert.c | 52 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 51 insertions(+), 1 deletion(-)

diff --git a/object-file-convert.c b/object-file-convert.c
index 70b80fb61e54..089b68442de8 100644
--- a/object-file-convert.c
+++ b/object-file-convert.c
@@ -7,6 +7,8 @@
 #include "hash.h"
 #include "object.h"
 #include "loose.h"
+#include "commit.h"
+#include "gpg-interface.h"
 #include "object-file-convert.h"
 
 int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
@@ -83,6 +85,52 @@ static int convert_tree_object(struct strbuf *out,
 	return 0;
 }
 
+static int convert_tag_object(struct strbuf *out,
+			      const struct git_hash_algo *from,
+			      const struct git_hash_algo *to,
+			      const char *buffer, size_t size)
+{
+	struct strbuf payload = STRBUF_INIT, temp = STRBUF_INIT, oursig = STRBUF_INIT, othersig = STRBUF_INIT;
+	size_t payload_size;
+	struct object_id oid, mapped_oid;
+	const char *p;
+
+	/* Add some slop for longer signature header in the new algorithm. */
+	strbuf_grow(out, size + 7);
+
+	/* Is there a signature for our algorithm? */
+	payload_size = parse_signed_buffer(buffer, size);
+	strbuf_add(&payload, buffer, payload_size);
+	if (payload_size != size) {
+		/* Yes, there is. */
+		strbuf_add(&oursig, buffer + payload_size, size - payload_size);
+	}
+	/* Now, is there a signature for the other algorithm? */
+	if (parse_buffer_signed_by_header(payload.buf, payload.len, &temp, &othersig, to)) {
+		/* Yes, there is. */
+		strbuf_swap(&payload, &temp);
+		strbuf_release(&temp);
+	}
+
+	/*
+	 * Our payload is now in payload and we may have up to two signatrures
+	 * in oursig and othersig.
+	 */
+	if (strncmp(payload.buf, "object ", 7) || payload.buf[from->hexsz + 7] != '\n')
+		return error("bogus tag object");
+	if (parse_oid_hex_algop(payload.buf + 7, &oid, &p, from) < 0)
+		return error("bad tag object ID");
+	if (repo_oid_to_algop(the_repository, &oid, to, &mapped_oid))
+		return error("unable to map tree %s in tag object",
+			     oid_to_hex(&oid));
+	strbuf_addf(out, "object %s", oid_to_hex(&mapped_oid));
+	strbuf_add(out, p, payload.len - (p - payload.buf));
+	strbuf_addbuf(out, &othersig);
+	if (oursig.len)
+		add_header_signature(out, &oursig, from);
+	return 0;
+}
+
 int convert_object_file(struct strbuf *outbuf,
 			const struct git_hash_algo *from,
 			const struct git_hash_algo *to,
@@ -100,8 +148,10 @@ int convert_object_file(struct strbuf *outbuf,
 	case OBJ_TREE:
 		ret = convert_tree_object(outbuf, from, to, buf, len);
 		break;
-	case OBJ_COMMIT:
 	case OBJ_TAG:
+		ret = convert_tag_object(outbuf, from, to, buf, len);
+		break;
+	case OBJ_COMMIT:
 	default:
 		/* Not implemented yet, so fail. */
 		ret = -1;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 17/30] object-file-convert: don't leak when converting tag objects
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
                     ` (15 preceding siblings ...)
  2023-10-02  2:40   ` [PATCH v2 16/30] object-file-convert: convert tag objects when writing Eric W. Biederman
@ 2023-10-02  2:40   ` Eric W. Biederman
  2023-10-02  2:40   ` [PATCH v2 18/30] object-file-convert: convert commit objects when writing Eric W. Biederman
                     ` (14 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  2:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

Upon close examination I discovered that while brian's code to convert
tag objects was functionally correct, it leaked memory.

Rearrange the code so that all error checking happens before any
memory is allocated.

Add code to release the temporary strbufs the code uses.

The code pretty much assumes the tag object ends with a newline,
so add an explict test to verify that is the case.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 object-file-convert.c | 45 ++++++++++++++++++++++++-------------------
 1 file changed, 25 insertions(+), 20 deletions(-)

diff --git a/object-file-convert.c b/object-file-convert.c
index 089b68442de8..79e8e211ff95 100644
--- a/object-file-convert.c
+++ b/object-file-convert.c
@@ -90,44 +90,49 @@ static int convert_tag_object(struct strbuf *out,
 			      const struct git_hash_algo *to,
 			      const char *buffer, size_t size)
 {
-	struct strbuf payload = STRBUF_INIT, temp = STRBUF_INIT, oursig = STRBUF_INIT, othersig = STRBUF_INIT;
+	struct strbuf payload = STRBUF_INIT, oursig = STRBUF_INIT, othersig = STRBUF_INIT;
+	const int entry_len = from->hexsz + 7;
 	size_t payload_size;
 	struct object_id oid, mapped_oid;
 	const char *p;
 
-	/* Add some slop for longer signature header in the new algorithm. */
-	strbuf_grow(out, size + 7);
+	/* Consume the object line */
+	if ((entry_len >= size) ||
+	    memcmp(buffer, "object ", 7) || buffer[entry_len] != '\n')
+		return error("bogus tag object");
+	if (parse_oid_hex_algop(buffer + 7, &oid, &p, from) < 0)
+		return error("bad tag object ID");
+	if (repo_oid_to_algop(the_repository, &oid, to, &mapped_oid))
+		return error("unable to map tree %s in tag object",
+			     oid_to_hex(&oid));
+	size -= ((p + 1) - buffer);
+	buffer = p + 1;
 
 	/* Is there a signature for our algorithm? */
 	payload_size = parse_signed_buffer(buffer, size);
-	strbuf_add(&payload, buffer, payload_size);
 	if (payload_size != size) {
 		/* Yes, there is. */
 		strbuf_add(&oursig, buffer + payload_size, size - payload_size);
 	}
-	/* Now, is there a signature for the other algorithm? */
-	if (parse_buffer_signed_by_header(payload.buf, payload.len, &temp, &othersig, to)) {
-		/* Yes, there is. */
-		strbuf_swap(&payload, &temp);
-		strbuf_release(&temp);
-	}
 
+	/* Now, is there a signature for the other algorithm? */
+	parse_buffer_signed_by_header(buffer, payload_size, &payload, &othersig, to);
 	/*
 	 * Our payload is now in payload and we may have up to two signatrures
 	 * in oursig and othersig.
 	 */
-	if (strncmp(payload.buf, "object ", 7) || payload.buf[from->hexsz + 7] != '\n')
-		return error("bogus tag object");
-	if (parse_oid_hex_algop(payload.buf + 7, &oid, &p, from) < 0)
-		return error("bad tag object ID");
-	if (repo_oid_to_algop(the_repository, &oid, to, &mapped_oid))
-		return error("unable to map tree %s in tag object",
-			     oid_to_hex(&oid));
-	strbuf_addf(out, "object %s", oid_to_hex(&mapped_oid));
-	strbuf_add(out, p, payload.len - (p - payload.buf));
-	strbuf_addbuf(out, &othersig);
+
+	/* Add some slop for longer signature header in the new algorithm. */
+	strbuf_grow(out, (7 + to->hexsz + 1) + size + 7);
+	strbuf_addf(out, "object %s\n", oid_to_hex(&mapped_oid));
+	strbuf_addbuf(out, &payload);
 	if (oursig.len)
 		add_header_signature(out, &oursig, from);
+	strbuf_addbuf(out, &othersig);
+
+	strbuf_release(&payload);
+	strbuf_release(&othersig);
+	strbuf_release(&oursig);
 	return 0;
 }
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 18/30] object-file-convert: convert commit objects when writing
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
                     ` (16 preceding siblings ...)
  2023-10-02  2:40   ` [PATCH v2 17/30] object-file-convert: don't leak when converting tag objects Eric W. Biederman
@ 2023-10-02  2:40   ` Eric W. Biederman
  2023-10-02  2:40   ` [PATCH v2 19/30] object-file-convert: convert commits that embed signed tags Eric W. Biederman
                     ` (13 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  2:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

From: "brian m. carlson" <sandals@crustytoothpaste.net>

When writing a commit object in a repository with both SHA-1 and
SHA-256, we'll need to convert our commit objects so that we can write
the hash values for both into the repository.  To do so, let's add a
function to convert commit objects.

Read the commit object and map the tree value and any of the parent
values, and copy the rest of the commit through unmodified.  Note that
we don't need to modify the signature headers, because they are the
same under both algorithms.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 object-file-convert.c | 46 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 45 insertions(+), 1 deletion(-)

diff --git a/object-file-convert.c b/object-file-convert.c
index 79e8e211ff95..0da081104ed4 100644
--- a/object-file-convert.c
+++ b/object-file-convert.c
@@ -136,6 +136,48 @@ static int convert_tag_object(struct strbuf *out,
 	return 0;
 }
 
+static int convert_commit_object(struct strbuf *out,
+				 const struct git_hash_algo *from,
+				 const struct git_hash_algo *to,
+				 const char *buffer, size_t size)
+{
+	const char *tail = buffer;
+	const char *bufptr = buffer;
+	const int tree_entry_len = from->hexsz + 5;
+	const int parent_entry_len = from->hexsz + 7;
+	struct object_id oid, mapped_oid;
+	const char *p;
+
+	tail += size;
+	if (tail <= bufptr + tree_entry_len + 1 || memcmp(bufptr, "tree ", 5) ||
+			bufptr[tree_entry_len] != '\n')
+		return error("bogus commit object");
+	if (parse_oid_hex_algop(bufptr + 5, &oid, &p, from) < 0)
+		return error("bad tree pointer");
+
+	if (repo_oid_to_algop(the_repository, &oid, to, &mapped_oid))
+		return error("unable to map tree %s in commit object",
+			     oid_to_hex(&oid));
+	strbuf_addf(out, "tree %s\n", oid_to_hex(&mapped_oid));
+	bufptr = p + 1;
+
+	while (bufptr + parent_entry_len < tail && !memcmp(bufptr, "parent ", 7)) {
+		if (tail <= bufptr + parent_entry_len + 1 ||
+		    parse_oid_hex_algop(bufptr + 7, &oid, &p, from) ||
+		    *p != '\n')
+			return error("bad parents in commit");
+
+		if (repo_oid_to_algop(the_repository, &oid, to, &mapped_oid))
+			return error("unable to map parent %s in commit object",
+				     oid_to_hex(&oid));
+
+		strbuf_addf(out, "parent %s\n", oid_to_hex(&mapped_oid));
+		bufptr = p + 1;
+	}
+	strbuf_add(out, bufptr, tail - bufptr);
+	return 0;
+}
+
 int convert_object_file(struct strbuf *outbuf,
 			const struct git_hash_algo *from,
 			const struct git_hash_algo *to,
@@ -150,13 +192,15 @@ int convert_object_file(struct strbuf *outbuf,
 		BUG("Refusing noop object file conversion");
 
 	switch (type) {
+	case OBJ_COMMIT:
+		ret = convert_commit_object(outbuf, from, to, buf, len);
+		break;
 	case OBJ_TREE:
 		ret = convert_tree_object(outbuf, from, to, buf, len);
 		break;
 	case OBJ_TAG:
 		ret = convert_tag_object(outbuf, from, to, buf, len);
 		break;
-	case OBJ_COMMIT:
 	default:
 		/* Not implemented yet, so fail. */
 		ret = -1;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 19/30] object-file-convert: convert commits that embed signed tags
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
                     ` (17 preceding siblings ...)
  2023-10-02  2:40   ` [PATCH v2 18/30] object-file-convert: convert commit objects when writing Eric W. Biederman
@ 2023-10-02  2:40   ` Eric W. Biederman
  2023-10-02  2:40   ` [PATCH v2 20/30] object-file: update object_info_extended to reencode objects Eric W. Biederman
                     ` (12 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  2:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

As mentioned in the hash function transition plan commit mergetag
lines need to be handled.  The commit mergetag lines embed an entire
tag object in a commit object.

Keep the implementation sane if not fast by unembedding the tag
object, converting the tag object, and embedding the new tag object,
in the new commit object.

In the long run I don't expect any other approach is maintainable, as
tag objects may be extended in ways that require additional
translation.

To keep the implementation of convert_commit_object maintainable I
have modified convert_commit_object to process the lines in any order,
and to fail on unknown lines.  We can't know ahead of time if a new
line might embed something that needs translation or not so it is
better to fail and require the code to be updated instead of silently
mistranslating objects.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 object-file-convert.c | 104 +++++++++++++++++++++++++++++++++---------
 1 file changed, 82 insertions(+), 22 deletions(-)

diff --git a/object-file-convert.c b/object-file-convert.c
index 0da081104ed4..4f6189095be8 100644
--- a/object-file-convert.c
+++ b/object-file-convert.c
@@ -146,35 +146,95 @@ static int convert_commit_object(struct strbuf *out,
 	const int tree_entry_len = from->hexsz + 5;
 	const int parent_entry_len = from->hexsz + 7;
 	struct object_id oid, mapped_oid;
-	const char *p;
+	const char *p, *eol;
 
 	tail += size;
-	if (tail <= bufptr + tree_entry_len + 1 || memcmp(bufptr, "tree ", 5) ||
-			bufptr[tree_entry_len] != '\n')
-		return error("bogus commit object");
-	if (parse_oid_hex_algop(bufptr + 5, &oid, &p, from) < 0)
-		return error("bad tree pointer");
 
-	if (repo_oid_to_algop(the_repository, &oid, to, &mapped_oid))
-		return error("unable to map tree %s in commit object",
-			     oid_to_hex(&oid));
-	strbuf_addf(out, "tree %s\n", oid_to_hex(&mapped_oid));
-	bufptr = p + 1;
+	while ((bufptr < tail) && (*bufptr != '\n')) {
+		eol = memchr(bufptr, '\n', tail - bufptr);
+		if (!eol)
+			return error(_("bad %s in commit"), "line");
+
+		if (((bufptr + 5) < eol) && !memcmp(bufptr, "tree ", 5))
+		{
+			if (((bufptr + tree_entry_len) != eol) ||
+			    parse_oid_hex_algop(bufptr + 5, &oid, &p, from) ||
+			    (p != eol))
+				return error(_("bad %s in commit"), "tree");
+
+			if (repo_oid_to_algop(the_repository, &oid, to, &mapped_oid))
+				return error(_("unable to map %s %s in commit object"),
+					     "tree", oid_to_hex(&oid));
+			strbuf_addf(out, "tree %s\n", oid_to_hex(&mapped_oid));
+		}
+		else if (((bufptr + 7) < eol) && !memcmp(bufptr, "parent ", 7))
+		{
+			if (((bufptr + parent_entry_len) != eol) ||
+			    parse_oid_hex_algop(bufptr + 7, &oid, &p, from) ||
+			    (p != eol))
+				return error(_("bad %s in commit"), "parent");
 
-	while (bufptr + parent_entry_len < tail && !memcmp(bufptr, "parent ", 7)) {
-		if (tail <= bufptr + parent_entry_len + 1 ||
-		    parse_oid_hex_algop(bufptr + 7, &oid, &p, from) ||
-		    *p != '\n')
-			return error("bad parents in commit");
+			if (repo_oid_to_algop(the_repository, &oid, to, &mapped_oid))
+				return error(_("unable to map %s %s in commit object"),
+					     "parent", oid_to_hex(&oid));
 
-		if (repo_oid_to_algop(the_repository, &oid, to, &mapped_oid))
-			return error("unable to map parent %s in commit object",
-				     oid_to_hex(&oid));
+			strbuf_addf(out, "parent %s\n", oid_to_hex(&mapped_oid));
+		}
+		else if (((bufptr + 9) < eol) && !memcmp(bufptr, "mergetag ", 9))
+		{
+			struct strbuf tag = STRBUF_INIT, new_tag = STRBUF_INIT;
 
-		strbuf_addf(out, "parent %s\n", oid_to_hex(&mapped_oid));
-		bufptr = p + 1;
+			/* Recover the tag object from the mergetag */
+			strbuf_add(&tag, bufptr + 9, (eol - (bufptr + 9)) + 1);
+
+			bufptr = eol + 1;
+			while ((bufptr < tail) && (*bufptr == ' ')) {
+				eol = memchr(bufptr, '\n', tail - bufptr);
+				if (!eol) {
+					strbuf_release(&tag);
+					return error(_("bad %s in commit"), "mergetag continuation");
+				}
+				strbuf_add(&tag, bufptr + 1, (eol - (bufptr + 1)) + 1);
+				bufptr = eol + 1;
+			}
+
+			/* Compute the new tag object */
+			if (convert_tag_object(&new_tag, from, to, tag.buf, tag.len)) {
+				strbuf_release(&tag);
+				strbuf_release(&new_tag);
+				return -1;
+			}
+
+			/* Write the new mergetag */
+			strbuf_addstr(out, "mergetag");
+			strbuf_add_lines(out, " ", new_tag.buf, new_tag.len);
+			strbuf_release(&tag);
+			strbuf_release(&new_tag);
+		}
+		else if (((bufptr + 7) < tail) && !memcmp(bufptr, "author ", 7))
+			strbuf_add(out, bufptr, (eol - bufptr) + 1);
+		else if (((bufptr + 10) < tail) && !memcmp(bufptr, "committer ", 10))
+			strbuf_add(out, bufptr, (eol - bufptr) + 1);
+		else if (((bufptr + 9) < tail) && !memcmp(bufptr, "encoding ", 9))
+			strbuf_add(out, bufptr, (eol - bufptr) + 1);
+		else if (((bufptr + 6) < tail) && !memcmp(bufptr, "gpgsig", 6))
+			strbuf_add(out, bufptr, (eol - bufptr) + 1);
+		else {
+			/* Unknown line fail it might embed an oid */
+			return -1;
+		}
+		/* Consume any trailing continuation lines */
+		bufptr = eol + 1;
+		while ((bufptr < tail) && (*bufptr == ' ')) {
+			eol = memchr(bufptr, '\n', tail - bufptr);
+			if (!eol)
+				return error(_("bad %s in commit"), "continuation");
+			strbuf_add(out, bufptr, (eol - bufptr) + 1);
+			bufptr = eol + 1;
+		}
 	}
-	strbuf_add(out, bufptr, tail - bufptr);
+	if (bufptr < tail)
+		strbuf_add(out, bufptr, tail - bufptr);
 	return 0;
 }
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 20/30] object-file: update object_info_extended to reencode objects
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
                     ` (18 preceding siblings ...)
  2023-10-02  2:40   ` [PATCH v2 19/30] object-file-convert: convert commits that embed signed tags Eric W. Biederman
@ 2023-10-02  2:40   ` Eric W. Biederman
  2023-10-02  2:40   ` [PATCH v2 21/30] repository: implement extensions.compatObjectFormat Eric W. Biederman
                     ` (11 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  2:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

oid_object_info_extended is updated to detect an oid encoding that
does not match the current repository, use repo_oid_to_algop to find
the correspoding oid in the current repository and to return the data
for the oid.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 object-file.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 91 insertions(+)

diff --git a/object-file.c b/object-file.c
index 820810a5f4b3..b2d43d009898 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1662,10 +1662,101 @@ static int do_oid_object_info_extended(struct repository *r,
 	return 0;
 }
 
+static int oid_object_info_convert(struct repository *r,
+				   const struct object_id *input_oid,
+				   struct object_info *input_oi, unsigned flags)
+{
+	const struct git_hash_algo *input_algo = &hash_algos[input_oid->algo];
+	int do_die = flags & OBJECT_INFO_DIE_IF_CORRUPT;
+	struct strbuf type_name = STRBUF_INIT;
+	struct object_id oid, delta_base_oid;
+	struct object_info new_oi, *oi;
+	unsigned long size;
+	void *content;
+	int ret;
+
+	if (repo_oid_to_algop(r, input_oid, the_hash_algo, &oid)) {
+		if (do_die)
+			die(_("missing mapping of %s to %s"),
+			    oid_to_hex(input_oid), the_hash_algo->name);
+		return -1;
+	}
+
+	/* Is new_oi needed? */
+	oi = input_oi;
+	if (input_oi && (input_oi->delta_base_oid || input_oi->sizep ||
+			 input_oi->contentp)) {
+		new_oi = *input_oi;
+		/* Does delta_base_oid need to be converted? */
+		if (input_oi->delta_base_oid)
+			new_oi.delta_base_oid = &delta_base_oid;
+		/* Will the attributes differ when converted? */
+		if (input_oi->sizep || input_oi->contentp) {
+			new_oi.contentp = &content;
+			new_oi.sizep = &size;
+			new_oi.type_name = &type_name;
+		}
+		oi = &new_oi;
+	}
+
+	ret = oid_object_info_extended(r, &oid, oi, flags);
+	if (ret)
+		return -1;
+	if (oi == input_oi)
+		return ret;
+
+	if (new_oi.contentp) {
+		struct strbuf outbuf = STRBUF_INIT;
+		enum object_type type;
+
+		type = type_from_string_gently(type_name.buf, type_name.len,
+					       !do_die);
+		if (type == -1)
+			return -1;
+		if (type != OBJ_BLOB) {
+			ret = convert_object_file(&outbuf,
+						  the_hash_algo, input_algo,
+						  content, size, type, !do_die);
+			if (ret == -1)
+				return -1;
+			free(content);
+			size = outbuf.len;
+			content = strbuf_detach(&outbuf, NULL);
+		}
+		if (input_oi->sizep)
+			*input_oi->sizep = size;
+		if (input_oi->contentp)
+			*input_oi->contentp = content;
+		else
+			free(content);
+		if (input_oi->type_name)
+			*input_oi->type_name = type_name;
+		else
+			strbuf_release(&type_name);
+	}
+	if (new_oi.delta_base_oid == &delta_base_oid) {
+		if (repo_oid_to_algop(r, &delta_base_oid, input_algo,
+				 input_oi->delta_base_oid)) {
+			if (do_die)
+				die(_("missing mapping of %s to %s"),
+				    oid_to_hex(&delta_base_oid),
+				    input_algo->name);
+			return -1;
+		}
+	}
+	input_oi->whence = new_oi.whence;
+	input_oi->u = new_oi.u;
+	return ret;
+}
+
 int oid_object_info_extended(struct repository *r, const struct object_id *oid,
 			     struct object_info *oi, unsigned flags)
 {
 	int ret;
+
+	if (oid->algo && (hash_algo_by_ptr(r->hash_algo) != oid->algo))
+		return oid_object_info_convert(r, oid, oi, flags);
+
 	obj_read_lock();
 	ret = do_oid_object_info_extended(r, oid, oi, flags);
 	obj_read_unlock();
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 21/30] repository: implement extensions.compatObjectFormat
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
                     ` (19 preceding siblings ...)
  2023-10-02  2:40   ` [PATCH v2 20/30] object-file: update object_info_extended to reencode objects Eric W. Biederman
@ 2023-10-02  2:40   ` Eric W. Biederman
  2023-10-02  2:40   ` [PATCH v2 22/30] rev-parse: add an --output-object-format parameter Eric W. Biederman
                     ` (10 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  2:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

From: "brian m. carlson" <sandals@crustytoothpaste.net>

Add a configuration option to enable updating and reading from
compatibility hash maps when git accesses the reposotiry.

Call the helper function repo_set_compat_hash_algo with the value
that compatObjectFormat is set to.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 Documentation/config/extensions.txt | 12 ++++++++++++
 repository.c                        |  2 +-
 setup.c                             | 23 +++++++++++++++++++++--
 setup.h                             |  1 +
 4 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/Documentation/config/extensions.txt b/Documentation/config/extensions.txt
index bccaec7a9636..9f72e6d9f4f1 100644
--- a/Documentation/config/extensions.txt
+++ b/Documentation/config/extensions.txt
@@ -7,6 +7,18 @@ Note that this setting should only be set by linkgit:git-init[1] or
 linkgit:git-clone[1].  Trying to change it after initialization will not
 work and will produce hard-to-diagnose issues.
 
+extensions.compatObjectFormat::
+
+	Specify a compatitbility hash algorithm to use.  The acceptable values
+	are `sha1` and `sha256`.  The value specified must be different from the
+	value of extensions.objectFormat.  This allows client level
+	interoperability between git repositories whose objectFormat matches
+	this compatObjectFormat.  In particular when fully implemented the
+	pushes and pulls from a repository in whose objectFormat matches
+	compatObjectFormat.  As well as being able to use oids encoded in
+	compatObjectFormat in addition to oids encoded with objectFormat to
+	locally specify objects.
+
 extensions.worktreeConfig::
 	If enabled, then worktrees will load config settings from the
 	`$GIT_DIR/config.worktree` file in addition to the
diff --git a/repository.c b/repository.c
index 6214f61cf4e7..9d91536b613b 100644
--- a/repository.c
+++ b/repository.c
@@ -194,7 +194,7 @@ int repo_init(struct repository *repo,
 		goto error;
 
 	repo_set_hash_algo(repo, format.hash_algo);
-	repo_set_compat_hash_algo(repo, GIT_HASH_UNKNOWN);
+	repo_set_compat_hash_algo(repo, format.compat_hash_algo);
 	repo->repository_format_worktree_config = format.worktree_config;
 
 	/* take ownership of format.partial_clone */
diff --git a/setup.c b/setup.c
index aa8bf5da5226..85259a259be3 100644
--- a/setup.c
+++ b/setup.c
@@ -590,6 +590,25 @@ static enum extension_result handle_extension(const char *var,
 				     "extensions.objectformat", value);
 		data->hash_algo = format;
 		return EXTENSION_OK;
+	} else if (!strcmp(ext, "compatobjectformat")) {
+		struct string_list_item *item;
+		int format;
+
+		if (!value)
+			return config_error_nonbool(var);
+		format = hash_algo_by_name(value);
+		if (format == GIT_HASH_UNKNOWN)
+			return error(_("invalid value for '%s': '%s'"),
+				     "extensions.compatobjectformat", value);
+		/* For now only support compatObjectFormat being specified once. */
+		for_each_string_list_item(item, &data->v1_only_extensions) {
+			if (!strcmp(item->string, "compatobjectformat"))
+				return error(_("'%s' already specified as '%s'"),
+					"extensions.compatobjectformat",
+					hash_algos[data->compat_hash_algo].name);
+		}
+		data->compat_hash_algo = format;
+		return EXTENSION_OK;
 	}
 	return EXTENSION_UNKNOWN;
 }
@@ -1565,7 +1584,7 @@ const char *setup_git_directory_gently(int *nongit_ok)
 		if (startup_info->have_repository) {
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
 			repo_set_compat_hash_algo(the_repository,
-						  GIT_HASH_UNKNOWN);
+						  repo_fmt.compat_hash_algo);
 			the_repository->repository_format_worktree_config =
 				repo_fmt.worktree_config;
 			/* take ownership of repo_fmt.partial_clone */
@@ -1659,7 +1678,7 @@ void check_repository_format(struct repository_format *fmt)
 	check_repository_format_gently(get_git_dir(), fmt, NULL);
 	startup_info->have_repository = 1;
 	repo_set_hash_algo(the_repository, fmt->hash_algo);
-	repo_set_compat_hash_algo(the_repository, GIT_HASH_UNKNOWN);
+	repo_set_compat_hash_algo(the_repository, fmt->compat_hash_algo);
 	the_repository->repository_format_worktree_config =
 		fmt->worktree_config;
 	the_repository->repository_format_partial_clone =
diff --git a/setup.h b/setup.h
index 58fd2605dd26..5d678ceb8caa 100644
--- a/setup.h
+++ b/setup.h
@@ -86,6 +86,7 @@ struct repository_format {
 	int worktree_config;
 	int is_bare;
 	int hash_algo;
+	int compat_hash_algo;
 	int sparse_index;
 	char *work_tree;
 	struct string_list unknown_extensions;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 22/30] rev-parse: add an --output-object-format parameter
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
                     ` (20 preceding siblings ...)
  2023-10-02  2:40   ` [PATCH v2 21/30] repository: implement extensions.compatObjectFormat Eric W. Biederman
@ 2023-10-02  2:40   ` Eric W. Biederman
  2024-02-08 16:25     ` Jean-Noël Avila
  2023-10-02  2:40   ` [PATCH v2 23/30] builtin/cat-file: let the oid determine the output algorithm Eric W. Biederman
                     ` (9 subsequent siblings)
  31 siblings, 1 reply; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  2:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

The new --output-object-format parameter returns the oid in the
specified format.

This is a generally useful plumbing facility.  It is useful for writing
test cases and for directly querying the translation maps.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 Documentation/git-rev-parse.txt | 12 ++++++++++++
 builtin/rev-parse.c             | 23 +++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/Documentation/git-rev-parse.txt b/Documentation/git-rev-parse.txt
index f26a7591e373..f0f9021f2a5a 100644
--- a/Documentation/git-rev-parse.txt
+++ b/Documentation/git-rev-parse.txt
@@ -159,6 +159,18 @@ for another option.
 	unfortunately named tag "master"), and show them as full
 	refnames (e.g. "refs/heads/master").
 
+--output-object-format=(sha1|sha256|storage)::
+
+	Allow oids to be input from any object format that the current
+	repository supports.
+
+	Specifying "sha1" translates if necessary and returns a sha1 oid.
+
+	Specifying "sha256" translates if necessary and returns a sha256 oid.
+
+	Specifying "storage" translates if necessary and returns an oid in
+	encoded in the storage hash algorithm.
+
 Options for Objects
 ~~~~~~~~~~~~~~~~~~~
 
diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index 43e96765400c..0ef3e658cc5b 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -25,6 +25,7 @@
 #include "submodule.h"
 #include "commit-reach.h"
 #include "shallow.h"
+#include "object-file-convert.h"
 
 #define DO_REVS		1
 #define DO_NOREV	2
@@ -675,6 +676,8 @@ static void print_path(const char *path, const char *prefix, enum format_type fo
 int cmd_rev_parse(int argc, const char **argv, const char *prefix)
 {
 	int i, as_is = 0, verify = 0, quiet = 0, revs_count = 0, type = 0;
+	const struct git_hash_algo *output_algo = NULL;
+	const struct git_hash_algo *compat = NULL;
 	int did_repo_setup = 0;
 	int has_dashdash = 0;
 	int output_prefix = 0;
@@ -746,6 +749,7 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
 
 			prepare_repo_settings(the_repository);
 			the_repository->settings.command_requires_full_index = 0;
+			compat = the_repository->compat_hash_algo;
 		}
 
 		if (!strcmp(arg, "--")) {
@@ -833,6 +837,22 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
 				flags |= GET_OID_QUIETLY;
 				continue;
 			}
+			if (opt_with_value(arg, "--output-object-format", &arg)) {
+				if (!arg)
+					die(_("no object format specified"));
+				if (!strcmp(arg, the_hash_algo->name) ||
+				    !strcmp(arg, "storage")) {
+					flags |= GET_OID_HASH_ANY;
+					output_algo = the_hash_algo;
+					continue;
+				}
+				else if (compat && !strcmp(arg, compat->name)) {
+					flags |= GET_OID_HASH_ANY;
+					output_algo = compat;
+					continue;
+				}
+				else die(_("unsupported object format: %s"), arg);
+			}
 			if (opt_with_value(arg, "--short", &arg)) {
 				filter &= ~(DO_FLAGS|DO_NOREV);
 				verify = 1;
@@ -1083,6 +1103,9 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
 		}
 		if (!get_oid_with_context(the_repository, name,
 					  flags, &oid, &unused)) {
+			if (output_algo)
+				repo_oid_to_algop(the_repository, &oid,
+						  output_algo, &oid);
 			if (verify)
 				revs_count++;
 			else
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 23/30] builtin/cat-file: let the oid determine the output algorithm
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
                     ` (21 preceding siblings ...)
  2023-10-02  2:40   ` [PATCH v2 22/30] rev-parse: add an --output-object-format parameter Eric W. Biederman
@ 2023-10-02  2:40   ` Eric W. Biederman
  2023-10-02  2:40   ` [PATCH v2 24/30] tree-walk: init_tree_desc take an oid to get the hash algorithm Eric W. Biederman
                     ` (8 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  2:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

Use GET_OID_HASH_ANY when calling get_oid_with_context.  This
implements the semi-obvious behaviour that specifying a sha1 oid shows
the output for a sha1 encoded object, and specifying a sha256 oid
shows the output for a sha256 encoded object.

This is useful for testing the the conversion of an object to an
equivalent object encoded with a different hash function.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 builtin/cat-file.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 694c8538df2f..e615d1f8e0da 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -107,7 +107,10 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 	struct object_info oi = OBJECT_INFO_INIT;
 	struct strbuf sb = STRBUF_INIT;
 	unsigned flags = OBJECT_INFO_LOOKUP_REPLACE;
-	unsigned get_oid_flags = GET_OID_RECORD_PATH | GET_OID_ONLY_TO_DIE;
+	unsigned get_oid_flags =
+		GET_OID_RECORD_PATH |
+		GET_OID_ONLY_TO_DIE |
+		GET_OID_HASH_ANY;
 	const char *path = force_path;
 	const int opt_cw = (opt == 'c' || opt == 'w');
 	if (!path && opt_cw)
@@ -223,7 +226,8 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 								     &size);
 				const char *target;
 				if (!skip_prefix(buffer, "object ", &target) ||
-				    get_oid_hex(target, &blob_oid))
+				    get_oid_hex_algop(target, &blob_oid,
+						      &hash_algos[oid.algo]))
 					die("%s not a valid tag", oid_to_hex(&oid));
 				free(buffer);
 			} else
@@ -512,7 +516,9 @@ static void batch_one_object(const char *obj_name,
 			     struct expand_data *data)
 {
 	struct object_context ctx;
-	int flags = opt->follow_symlinks ? GET_OID_FOLLOW_SYMLINKS : 0;
+	int flags =
+		GET_OID_HASH_ANY |
+		(opt->follow_symlinks ? GET_OID_FOLLOW_SYMLINKS : 0);
 	enum get_oid_result result;
 
 	result = get_oid_with_context(the_repository, obj_name,
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 24/30] tree-walk: init_tree_desc take an oid to get the hash algorithm
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
                     ` (22 preceding siblings ...)
  2023-10-02  2:40   ` [PATCH v2 23/30] builtin/cat-file: let the oid determine the output algorithm Eric W. Biederman
@ 2023-10-02  2:40   ` Eric W. Biederman
  2023-10-02  2:40   ` [PATCH v2 25/30] object-file: handle compat objects in check_object_signature Eric W. Biederman
                     ` (7 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  2:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

To make it possible for git ls-tree to display the tree encoded
in the hash algorithm of the oid specified to git ls-tree, update
init_tree_desc to take as a parameter the oid of the tree object.

Update all callers of init_tree_desc and init_tree_desc_gently
to pass the oid of the tree object.

Use the oid of the tree object to discover the hash algorithm
of the oid and store that hash algorithm in struct tree_desc.

Use the hash algorithm in decode_tree_entry and
update_tree_entry_internal to handle reading a tree object encoded in
a hash algorithm that differs from the repositories hash algorithm.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 archive.c              |  3 ++-
 builtin/am.c           |  6 +++---
 builtin/checkout.c     |  8 +++++---
 builtin/clone.c        |  2 +-
 builtin/commit.c       |  2 +-
 builtin/grep.c         |  8 ++++----
 builtin/merge.c        |  3 ++-
 builtin/pack-objects.c |  6 ++++--
 builtin/read-tree.c    |  2 +-
 builtin/stash.c        |  5 +++--
 cache-tree.c           |  2 +-
 delta-islands.c        |  2 +-
 diff-lib.c             |  2 +-
 fsck.c                 |  6 ++++--
 http-push.c            |  2 +-
 list-objects.c         |  2 +-
 match-trees.c          |  4 ++--
 merge-ort.c            | 11 ++++++-----
 merge-recursive.c      |  2 +-
 merge.c                |  3 ++-
 pack-bitmap-write.c    |  2 +-
 packfile.c             |  3 ++-
 reflog.c               |  2 +-
 revision.c             |  4 ++--
 tree-walk.c            | 36 +++++++++++++++++++++---------------
 tree-walk.h            |  7 +++++--
 tree.c                 |  2 +-
 walker.c               |  2 +-
 28 files changed, 80 insertions(+), 59 deletions(-)

diff --git a/archive.c b/archive.c
index ca11db185b15..b10269aee7be 100644
--- a/archive.c
+++ b/archive.c
@@ -339,7 +339,8 @@ int write_archive_entries(struct archiver_args *args,
 		opts.src_index = args->repo->index;
 		opts.dst_index = args->repo->index;
 		opts.fn = oneway_merge;
-		init_tree_desc(&t, args->tree->buffer, args->tree->size);
+		init_tree_desc(&t, &args->tree->object.oid,
+			       args->tree->buffer, args->tree->size);
 		if (unpack_trees(1, &t, &opts))
 			return -1;
 		git_attr_set_direction(GIT_ATTR_INDEX);
diff --git a/builtin/am.c b/builtin/am.c
index 8bde034fae68..4dfd714b910e 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1991,8 +1991,8 @@ static int fast_forward_to(struct tree *head, struct tree *remote, int reset)
 	opts.reset = reset ? UNPACK_RESET_PROTECT_UNTRACKED : 0;
 	opts.preserve_ignored = 0; /* FIXME: !overwrite_ignore */
 	opts.fn = twoway_merge;
-	init_tree_desc(&t[0], head->buffer, head->size);
-	init_tree_desc(&t[1], remote->buffer, remote->size);
+	init_tree_desc(&t[0], &head->object.oid, head->buffer, head->size);
+	init_tree_desc(&t[1], &remote->object.oid, remote->buffer, remote->size);
 
 	if (unpack_trees(2, t, &opts)) {
 		rollback_lock_file(&lock_file);
@@ -2026,7 +2026,7 @@ static int merge_tree(struct tree *tree)
 	opts.dst_index = &the_index;
 	opts.merge = 1;
 	opts.fn = oneway_merge;
-	init_tree_desc(&t[0], tree->buffer, tree->size);
+	init_tree_desc(&t[0], &tree->object.oid, tree->buffer, tree->size);
 
 	if (unpack_trees(1, t, &opts)) {
 		rollback_lock_file(&lock_file);
diff --git a/builtin/checkout.c b/builtin/checkout.c
index f53612f46870..03eff73fd031 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -701,7 +701,7 @@ static int reset_tree(struct tree *tree, const struct checkout_opts *o,
 			       info->commit ? &info->commit->object.oid : null_oid(),
 			       NULL);
 	parse_tree(tree);
-	init_tree_desc(&tree_desc, tree->buffer, tree->size);
+	init_tree_desc(&tree_desc, &tree->object.oid, tree->buffer, tree->size);
 	switch (unpack_trees(1, &tree_desc, &opts)) {
 	case -2:
 		*writeout_error = 1;
@@ -815,10 +815,12 @@ static int merge_working_tree(const struct checkout_opts *opts,
 			die(_("unable to parse commit %s"),
 				oid_to_hex(old_commit_oid));
 
-		init_tree_desc(&trees[0], tree->buffer, tree->size);
+		init_tree_desc(&trees[0], &tree->object.oid,
+			       tree->buffer, tree->size);
 		parse_tree(new_tree);
 		tree = new_tree;
-		init_tree_desc(&trees[1], tree->buffer, tree->size);
+		init_tree_desc(&trees[1], &tree->object.oid,
+			       tree->buffer, tree->size);
 
 		ret = unpack_trees(2, trees, &topts);
 		clear_unpack_trees_porcelain(&topts);
diff --git a/builtin/clone.c b/builtin/clone.c
index c6357af94989..79ceefb93995 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -737,7 +737,7 @@ static int checkout(int submodule_progress, int filter_submodules)
 	if (!tree)
 		die(_("unable to parse commit %s"), oid_to_hex(&oid));
 	parse_tree(tree);
-	init_tree_desc(&t, tree->buffer, tree->size);
+	init_tree_desc(&t, &tree->object.oid, tree->buffer, tree->size);
 	if (unpack_trees(1, &t, &opts) < 0)
 		die(_("unable to checkout working tree"));
 
diff --git a/builtin/commit.c b/builtin/commit.c
index 7da5f924484d..537319932b65 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -340,7 +340,7 @@ static void create_base_index(const struct commit *current_head)
 	if (!tree)
 		die(_("failed to unpack HEAD tree object"));
 	parse_tree(tree);
-	init_tree_desc(&t, tree->buffer, tree->size);
+	init_tree_desc(&t, &tree->object.oid, tree->buffer, tree->size);
 	if (unpack_trees(1, &t, &opts))
 		exit(128); /* We've already reported the error, finish dying */
 }
diff --git a/builtin/grep.c b/builtin/grep.c
index 50e712a18479..0c2b8a376f8e 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -530,7 +530,7 @@ static int grep_submodule(struct grep_opt *opt,
 		strbuf_addstr(&base, filename);
 		strbuf_addch(&base, '/');
 
-		init_tree_desc(&tree, data, size);
+		init_tree_desc(&tree, oid, data, size);
 		hit = grep_tree(&subopt, pathspec, &tree, &base, base.len,
 				object_type == OBJ_COMMIT);
 		strbuf_release(&base);
@@ -574,7 +574,7 @@ static int grep_cache(struct grep_opt *opt,
 
 			data = repo_read_object_file(the_repository, &ce->oid,
 						     &type, &size);
-			init_tree_desc(&tree, data, size);
+			init_tree_desc(&tree, &ce->oid, data, size);
 
 			hit |= grep_tree(opt, pathspec, &tree, &name, 0, 0);
 			strbuf_setlen(&name, name_base_len);
@@ -670,7 +670,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 				    oid_to_hex(&entry.oid));
 
 			strbuf_addch(base, '/');
-			init_tree_desc(&sub, data, size);
+			init_tree_desc(&sub, &entry.oid, data, size);
 			hit |= grep_tree(opt, pathspec, &sub, base, tn_len,
 					 check_attr);
 			free(data);
@@ -714,7 +714,7 @@ static int grep_object(struct grep_opt *opt, const struct pathspec *pathspec,
 			strbuf_add(&base, name, len);
 			strbuf_addch(&base, ':');
 		}
-		init_tree_desc(&tree, data, size);
+		init_tree_desc(&tree, &obj->oid, data, size);
 		hit = grep_tree(opt, pathspec, &tree, &base, base.len,
 				obj->type == OBJ_COMMIT);
 		strbuf_release(&base);
diff --git a/builtin/merge.c b/builtin/merge.c
index de68910177fb..718165d45917 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -704,7 +704,8 @@ static int read_tree_trivial(struct object_id *common, struct object_id *head,
 	cache_tree_free(&the_index.cache_tree);
 	for (i = 0; i < nr_trees; i++) {
 		parse_tree(trees[i]);
-		init_tree_desc(t+i, trees[i]->buffer, trees[i]->size);
+		init_tree_desc(t+i, &trees[i]->object.oid,
+			       trees[i]->buffer, trees[i]->size);
 	}
 	if (unpack_trees(nr_trees, t, &opts))
 		return -1;
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index d2a162d52804..d34902002656 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1756,7 +1756,8 @@ static void add_pbase_object(struct tree_desc *tree,
 			tree = pbase_tree_get(&entry.oid);
 			if (!tree)
 				return;
-			init_tree_desc(&sub, tree->tree_data, tree->tree_size);
+			init_tree_desc(&sub, &tree->oid,
+				       tree->tree_data, tree->tree_size);
 
 			add_pbase_object(&sub, down, downlen, fullname);
 			pbase_tree_put(tree);
@@ -1816,7 +1817,8 @@ static void add_preferred_base_object(const char *name)
 		}
 		else {
 			struct tree_desc tree;
-			init_tree_desc(&tree, it->pcache.tree_data, it->pcache.tree_size);
+			init_tree_desc(&tree, &it->pcache.oid,
+				       it->pcache.tree_data, it->pcache.tree_size);
 			add_pbase_object(&tree, name, cmplen, name);
 		}
 	}
diff --git a/builtin/read-tree.c b/builtin/read-tree.c
index 1fec702a04fa..24d6d156d3a2 100644
--- a/builtin/read-tree.c
+++ b/builtin/read-tree.c
@@ -264,7 +264,7 @@ int cmd_read_tree(int argc, const char **argv, const char *cmd_prefix)
 	for (i = 0; i < nr_trees; i++) {
 		struct tree *tree = trees[i];
 		parse_tree(tree);
-		init_tree_desc(t+i, tree->buffer, tree->size);
+		init_tree_desc(t+i, &tree->object.oid, tree->buffer, tree->size);
 	}
 	if (unpack_trees(nr_trees, t, &opts))
 		return 128;
diff --git a/builtin/stash.c b/builtin/stash.c
index fe64cde9ce30..9ee52af4d28e 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -285,7 +285,7 @@ static int reset_tree(struct object_id *i_tree, int update, int reset)
 	if (parse_tree(tree))
 		return -1;
 
-	init_tree_desc(t, tree->buffer, tree->size);
+	init_tree_desc(t, &tree->object.oid, tree->buffer, tree->size);
 
 	opts.head_idx = 1;
 	opts.src_index = &the_index;
@@ -871,7 +871,8 @@ static void diff_include_untracked(const struct stash_info *info, struct diff_op
 		tree[i] = parse_tree_indirect(oid[i]);
 		if (parse_tree(tree[i]) < 0)
 			die(_("failed to parse tree"));
-		init_tree_desc(&tree_desc[i], tree[i]->buffer, tree[i]->size);
+		init_tree_desc(&tree_desc[i], &tree[i]->object.oid,
+			       tree[i]->buffer, tree[i]->size);
 	}
 
 	unpack_tree_opt.head_idx = -1;
diff --git a/cache-tree.c b/cache-tree.c
index ddc7d3d86959..334973a01cee 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -770,7 +770,7 @@ static void prime_cache_tree_rec(struct repository *r,
 
 	oidcpy(&it->oid, &tree->object.oid);
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
+	init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size);
 	cnt = 0;
 	while (tree_entry(&desc, &entry)) {
 		if (!S_ISDIR(entry.mode))
diff --git a/delta-islands.c b/delta-islands.c
index 5de5759f3f13..1ff3506b10f2 100644
--- a/delta-islands.c
+++ b/delta-islands.c
@@ -289,7 +289,7 @@ void resolve_tree_islands(struct repository *r,
 		if (!tree || parse_tree(tree) < 0)
 			die(_("bad tree object %s"), oid_to_hex(&ent->idx.oid));
 
-		init_tree_desc(&desc, tree->buffer, tree->size);
+		init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size);
 		while (tree_entry(&desc, &entry)) {
 			struct object *obj;
 
diff --git a/diff-lib.c b/diff-lib.c
index 6b0c6a7180cc..add323f5628d 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -558,7 +558,7 @@ static int diff_cache(struct rev_info *revs,
 	opts.pathspec = &revs->diffopt.pathspec;
 	opts.pathspec->recursive = 1;
 
-	init_tree_desc(&t, tree->buffer, tree->size);
+	init_tree_desc(&t, &tree->object.oid, tree->buffer, tree->size);
 	return unpack_trees(1, &t, &opts);
 }
 
diff --git a/fsck.c b/fsck.c
index 2b1e348005b7..6b492a48da82 100644
--- a/fsck.c
+++ b/fsck.c
@@ -313,7 +313,8 @@ static int fsck_walk_tree(struct tree *tree, void *data, struct fsck_options *op
 		return -1;
 
 	name = fsck_get_object_name(options, &tree->object.oid);
-	if (init_tree_desc_gently(&desc, tree->buffer, tree->size, 0))
+	if (init_tree_desc_gently(&desc, &tree->object.oid,
+				  tree->buffer, tree->size, 0))
 		return -1;
 	while (tree_entry_gently(&desc, &entry)) {
 		struct object *obj;
@@ -583,7 +584,8 @@ static int fsck_tree(const struct object_id *tree_oid,
 	const char *o_name;
 	struct name_stack df_dup_candidates = { NULL };
 
-	if (init_tree_desc_gently(&desc, buffer, size, TREE_DESC_RAW_MODES)) {
+	if (init_tree_desc_gently(&desc, tree_oid, buffer, size,
+				  TREE_DESC_RAW_MODES)) {
 		retval += report(options, tree_oid, OBJ_TREE,
 				 FSCK_MSG_BAD_TREE,
 				 "cannot be parsed as a tree");
diff --git a/http-push.c b/http-push.c
index a704f490fdb2..81c35b5e96f7 100644
--- a/http-push.c
+++ b/http-push.c
@@ -1308,7 +1308,7 @@ static struct object_list **process_tree(struct tree *tree,
 	obj->flags |= SEEN;
 	p = add_one_object(obj, p);
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
+	init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size);
 
 	while (tree_entry(&desc, &entry))
 		switch (object_type(entry.mode)) {
diff --git a/list-objects.c b/list-objects.c
index e60a6cd5b46e..312335c8a7f2 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -97,7 +97,7 @@ static void process_tree_contents(struct traversal_context *ctx,
 	enum interesting match = ctx->revs->diffopt.pathspec.nr == 0 ?
 		all_entries_interesting : entry_not_interesting;
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
+	init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size);
 
 	while (tree_entry(&desc, &entry)) {
 		if (match != all_entries_interesting) {
diff --git a/match-trees.c b/match-trees.c
index 0885ac681cd5..3412b6a1401d 100644
--- a/match-trees.c
+++ b/match-trees.c
@@ -63,7 +63,7 @@ static void *fill_tree_desc_strict(struct tree_desc *desc,
 		die("unable to read tree (%s)", oid_to_hex(hash));
 	if (type != OBJ_TREE)
 		die("%s is not a tree", oid_to_hex(hash));
-	init_tree_desc(desc, buffer, size);
+	init_tree_desc(desc, hash, buffer, size);
 	return buffer;
 }
 
@@ -194,7 +194,7 @@ static int splice_tree(const struct object_id *oid1, const char *prefix,
 	buf = repo_read_object_file(the_repository, oid1, &type, &sz);
 	if (!buf)
 		die("cannot read tree %s", oid_to_hex(oid1));
-	init_tree_desc(&desc, buf, sz);
+	init_tree_desc(&desc, oid1, buf, sz);
 
 	rewrite_here = NULL;
 	while (desc.size) {
diff --git a/merge-ort.c b/merge-ort.c
index 8631c997002d..3a5729c91e48 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -1679,9 +1679,10 @@ static int collect_merge_info(struct merge_options *opt,
 	parse_tree(merge_base);
 	parse_tree(side1);
 	parse_tree(side2);
-	init_tree_desc(t + 0, merge_base->buffer, merge_base->size);
-	init_tree_desc(t + 1, side1->buffer, side1->size);
-	init_tree_desc(t + 2, side2->buffer, side2->size);
+	init_tree_desc(t + 0, &merge_base->object.oid,
+		       merge_base->buffer, merge_base->size);
+	init_tree_desc(t + 1, &side1->object.oid, side1->buffer, side1->size);
+	init_tree_desc(t + 2, &side2->object.oid, side2->buffer, side2->size);
 
 	trace2_region_enter("merge", "traverse_trees", opt->repo);
 	ret = traverse_trees(NULL, 3, t, &info);
@@ -4400,9 +4401,9 @@ static int checkout(struct merge_options *opt,
 	unpack_opts.fn = twoway_merge;
 	unpack_opts.preserve_ignored = 0; /* FIXME: !opts->overwrite_ignore */
 	parse_tree(prev);
-	init_tree_desc(&trees[0], prev->buffer, prev->size);
+	init_tree_desc(&trees[0], &prev->object.oid, prev->buffer, prev->size);
 	parse_tree(next);
-	init_tree_desc(&trees[1], next->buffer, next->size);
+	init_tree_desc(&trees[1], &next->object.oid, next->buffer, next->size);
 
 	ret = unpack_trees(2, trees, &unpack_opts);
 	clear_unpack_trees_porcelain(&unpack_opts);
diff --git a/merge-recursive.c b/merge-recursive.c
index 6a4081bb0f52..93df9eecdd95 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -411,7 +411,7 @@ static inline int merge_detect_rename(struct merge_options *opt)
 static void init_tree_desc_from_tree(struct tree_desc *desc, struct tree *tree)
 {
 	parse_tree(tree);
-	init_tree_desc(desc, tree->buffer, tree->size);
+	init_tree_desc(desc, &tree->object.oid, tree->buffer, tree->size);
 }
 
 static int unpack_trees_start(struct merge_options *opt,
diff --git a/merge.c b/merge.c
index b60925459c29..86179c34102d 100644
--- a/merge.c
+++ b/merge.c
@@ -81,7 +81,8 @@ int checkout_fast_forward(struct repository *r,
 	}
 	for (i = 0; i < nr_trees; i++) {
 		parse_tree(trees[i]);
-		init_tree_desc(t+i, trees[i]->buffer, trees[i]->size);
+		init_tree_desc(t+i, &trees[i]->object.oid,
+			       trees[i]->buffer, trees[i]->size);
 	}
 
 	memset(&opts, 0, sizeof(opts));
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index f6757c3cbf20..9211e08f0127 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -366,7 +366,7 @@ static int fill_bitmap_tree(struct bitmap *bitmap,
 	if (parse_tree(tree) < 0)
 		die("unable to load tree object %s",
 		    oid_to_hex(&tree->object.oid));
-	init_tree_desc(&desc, tree->buffer, tree->size);
+	init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size);
 
 	while (tree_entry(&desc, &entry)) {
 		switch (object_type(entry.mode)) {
diff --git a/packfile.c b/packfile.c
index 9cc0a2e37a83..1fae0fcdd9e7 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2250,7 +2250,8 @@ static int add_promisor_object(const struct object_id *oid,
 		struct tree *tree = (struct tree *)obj;
 		struct tree_desc desc;
 		struct name_entry entry;
-		if (init_tree_desc_gently(&desc, tree->buffer, tree->size, 0))
+		if (init_tree_desc_gently(&desc, &tree->object.oid,
+					  tree->buffer, tree->size, 0))
 			/*
 			 * Error messages are given when packs are
 			 * verified, so do not print any here.
diff --git a/reflog.c b/reflog.c
index 9ad50e7d93e4..c6992a19268f 100644
--- a/reflog.c
+++ b/reflog.c
@@ -40,7 +40,7 @@ static int tree_is_complete(const struct object_id *oid)
 		tree->buffer = data;
 		tree->size = size;
 	}
-	init_tree_desc(&desc, tree->buffer, tree->size);
+	init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size);
 	complete = 1;
 	while (tree_entry(&desc, &entry)) {
 		if (!repo_has_object_file(the_repository, &entry.oid) ||
diff --git a/revision.c b/revision.c
index 2f4c53ea207b..a60dfc23a2a5 100644
--- a/revision.c
+++ b/revision.c
@@ -82,7 +82,7 @@ static void mark_tree_contents_uninteresting(struct repository *r,
 	if (parse_tree_gently(tree, 1) < 0)
 		return;
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
+	init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size);
 	while (tree_entry(&desc, &entry)) {
 		switch (object_type(entry.mode)) {
 		case OBJ_TREE:
@@ -189,7 +189,7 @@ static void add_children_by_path(struct repository *r,
 	if (parse_tree_gently(tree, 1) < 0)
 		return;
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
+	init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size);
 	while (tree_entry(&desc, &entry)) {
 		switch (object_type(entry.mode)) {
 		case OBJ_TREE:
diff --git a/tree-walk.c b/tree-walk.c
index 3af50a01c2c7..0b44ec7c75ff 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -15,7 +15,7 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l
 	const char *path;
 	unsigned int len;
 	uint16_t mode;
-	const unsigned hashsz = the_hash_algo->rawsz;
+	const unsigned hashsz = desc->algo->rawsz;
 
 	if (size < hashsz + 3 || buf[size - (hashsz + 1)]) {
 		strbuf_addstr(err, _("too-short tree object"));
@@ -37,15 +37,19 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l
 	desc->entry.path = path;
 	desc->entry.mode = (desc->flags & TREE_DESC_RAW_MODES) ? mode : canon_mode(mode);
 	desc->entry.pathlen = len - 1;
-	oidread(&desc->entry.oid, (const unsigned char *)path + len);
+	oidread_algop(&desc->entry.oid, (const unsigned char *)path + len,
+		      desc->algo);
 
 	return 0;
 }
 
-static int init_tree_desc_internal(struct tree_desc *desc, const void *buffer,
-				   unsigned long size, struct strbuf *err,
+static int init_tree_desc_internal(struct tree_desc *desc,
+				   const struct object_id *oid,
+				   const void *buffer, unsigned long size,
+				   struct strbuf *err,
 				   enum tree_desc_flags flags)
 {
+	desc->algo = (oid && oid->algo) ? &hash_algos[oid->algo] : the_hash_algo;
 	desc->buffer = buffer;
 	desc->size = size;
 	desc->flags = flags;
@@ -54,19 +58,21 @@ static int init_tree_desc_internal(struct tree_desc *desc, const void *buffer,
 	return 0;
 }
 
-void init_tree_desc(struct tree_desc *desc, const void *buffer, unsigned long size)
+void init_tree_desc(struct tree_desc *desc, const struct object_id *tree_oid,
+		    const void *buffer, unsigned long size)
 {
 	struct strbuf err = STRBUF_INIT;
-	if (init_tree_desc_internal(desc, buffer, size, &err, 0))
+	if (init_tree_desc_internal(desc, tree_oid, buffer, size, &err, 0))
 		die("%s", err.buf);
 	strbuf_release(&err);
 }
 
-int init_tree_desc_gently(struct tree_desc *desc, const void *buffer, unsigned long size,
+int init_tree_desc_gently(struct tree_desc *desc, const struct object_id *oid,
+			  const void *buffer, unsigned long size,
 			  enum tree_desc_flags flags)
 {
 	struct strbuf err = STRBUF_INIT;
-	int result = init_tree_desc_internal(desc, buffer, size, &err, flags);
+	int result = init_tree_desc_internal(desc, oid, buffer, size, &err, flags);
 	if (result)
 		error("%s", err.buf);
 	strbuf_release(&err);
@@ -85,7 +91,7 @@ void *fill_tree_descriptor(struct repository *r,
 		if (!buf)
 			die("unable to read tree %s", oid_to_hex(oid));
 	}
-	init_tree_desc(desc, buf, size);
+	init_tree_desc(desc, oid, buf, size);
 	return buf;
 }
 
@@ -102,7 +108,7 @@ static void entry_extract(struct tree_desc *t, struct name_entry *a)
 static int update_tree_entry_internal(struct tree_desc *desc, struct strbuf *err)
 {
 	const void *buf = desc->buffer;
-	const unsigned char *end = (const unsigned char *)desc->entry.path + desc->entry.pathlen + 1 + the_hash_algo->rawsz;
+	const unsigned char *end = (const unsigned char *)desc->entry.path + desc->entry.pathlen + 1 + desc->algo->rawsz;
 	unsigned long size = desc->size;
 	unsigned long len = end - (const unsigned char *)buf;
 
@@ -611,7 +617,7 @@ int get_tree_entry(struct repository *r,
 		retval = -1;
 	} else {
 		struct tree_desc t;
-		init_tree_desc(&t, tree, size);
+		init_tree_desc(&t, tree_oid, tree, size);
 		retval = find_tree_entry(r, &t, name, oid, mode);
 	}
 	free(tree);
@@ -654,7 +660,7 @@ enum get_oid_result get_tree_entry_follow_symlinks(struct repository *r,
 	struct tree_desc t;
 	int follows_remaining = GET_TREE_ENTRY_FOLLOW_SYMLINKS_MAX_LINKS;
 
-	init_tree_desc(&t, NULL, 0UL);
+	init_tree_desc(&t, NULL, NULL, 0UL);
 	strbuf_addstr(&namebuf, name);
 	oidcpy(&current_tree_oid, tree_oid);
 
@@ -690,7 +696,7 @@ enum get_oid_result get_tree_entry_follow_symlinks(struct repository *r,
 				goto done;
 
 			/* descend */
-			init_tree_desc(&t, tree, size);
+			init_tree_desc(&t, &current_tree_oid, tree, size);
 		}
 
 		/* Handle symlinks to e.g. a//b by removing leading slashes */
@@ -724,7 +730,7 @@ enum get_oid_result get_tree_entry_follow_symlinks(struct repository *r,
 			free(parent->tree);
 			parents_nr--;
 			parent = &parents[parents_nr - 1];
-			init_tree_desc(&t, parent->tree, parent->size);
+			init_tree_desc(&t, &parent->oid, parent->tree, parent->size);
 			strbuf_remove(&namebuf, 0, remainder ? 3 : 2);
 			continue;
 		}
@@ -804,7 +810,7 @@ enum get_oid_result get_tree_entry_follow_symlinks(struct repository *r,
 			contents_start = contents;
 
 			parent = &parents[parents_nr - 1];
-			init_tree_desc(&t, parent->tree, parent->size);
+			init_tree_desc(&t, &parent->oid, parent->tree, parent->size);
 			strbuf_splice(&namebuf, 0, len,
 				      contents_start, link_len);
 			if (remainder)
diff --git a/tree-walk.h b/tree-walk.h
index 74cdceb3fed2..cf54d01019e9 100644
--- a/tree-walk.h
+++ b/tree-walk.h
@@ -26,6 +26,7 @@ struct name_entry {
  * A semi-opaque data structure used to maintain the current state of the walk.
  */
 struct tree_desc {
+	const struct git_hash_algo *algo;
 	/*
 	 * pointer into the memory representation of the tree. It always
 	 * points at the current entry being visited.
@@ -85,9 +86,11 @@ int update_tree_entry_gently(struct tree_desc *);
  * size parameters are assumed to be the same as the buffer and size
  * members of `struct tree`.
  */
-void init_tree_desc(struct tree_desc *desc, const void *buf, unsigned long size);
+void init_tree_desc(struct tree_desc *desc, const struct object_id *tree_oid,
+		    const void *buf, unsigned long size);
 
-int init_tree_desc_gently(struct tree_desc *desc, const void *buf, unsigned long size,
+int init_tree_desc_gently(struct tree_desc *desc, const struct object_id *oid,
+			  const void *buf, unsigned long size,
 			  enum tree_desc_flags flags);
 
 /*
diff --git a/tree.c b/tree.c
index c745462f968e..44bcf728f10a 100644
--- a/tree.c
+++ b/tree.c
@@ -27,7 +27,7 @@ int read_tree_at(struct repository *r,
 	if (parse_tree(tree))
 		return -1;
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
+	init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size);
 
 	while (tree_entry(&desc, &entry)) {
 		if (retval != all_entries_interesting) {
diff --git a/walker.c b/walker.c
index 65002a7220ad..c0fd632d921c 100644
--- a/walker.c
+++ b/walker.c
@@ -45,7 +45,7 @@ static int process_tree(struct walker *walker, struct tree *tree)
 	if (parse_tree(tree))
 		return -1;
 
-	init_tree_desc(&desc, tree->buffer, tree->size);
+	init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size);
 	while (tree_entry(&desc, &entry)) {
 		struct object *obj = NULL;
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 25/30] object-file: handle compat objects in check_object_signature
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
                     ` (23 preceding siblings ...)
  2023-10-02  2:40   ` [PATCH v2 24/30] tree-walk: init_tree_desc take an oid to get the hash algorithm Eric W. Biederman
@ 2023-10-02  2:40   ` Eric W. Biederman
  2023-10-02  2:40   ` [PATCH v2 26/30] builtin/ls-tree: let the oid determine the output algorithm Eric W. Biederman
                     ` (6 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  2:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

Update check_object_signature to find the hash algorithm the exising
signature uses, and to use the same hash algorithm when recomputing it
to check the signature is valid.

This will be useful when teaching git ls-tree to display objects
encoded with the compat hash algorithm.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 object-file.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/object-file.c b/object-file.c
index b2d43d009898..5fa4b14baee0 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1094,9 +1094,11 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 			   void *buf, unsigned long size,
 			   enum object_type type)
 {
+	const struct git_hash_algo *algo =
+		oid->algo ? &hash_algos[oid->algo] : r->hash_algo;
 	struct object_id real_oid;
 
-	hash_object_file(r->hash_algo, buf, size, type, &real_oid);
+	hash_object_file(algo, buf, size, type, &real_oid);
 
 	return !oideq(oid, &real_oid) ? -1 : 0;
 }
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 26/30] builtin/ls-tree: let the oid determine the output algorithm
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
                     ` (24 preceding siblings ...)
  2023-10-02  2:40   ` [PATCH v2 25/30] object-file: handle compat objects in check_object_signature Eric W. Biederman
@ 2023-10-02  2:40   ` Eric W. Biederman
  2023-10-02  2:40   ` [PATCH v2 27/30] test-lib: compute the compatibility hash so tests may use it Eric W. Biederman
                     ` (5 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  2:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

Update cmd_ls_tree to call get_oid_with_context and pass
GET_OID_HASH_ANY instead of calling the simpler repo_get_oid.

This implments in ls-tree the behavior that asking to display a sha1
hash displays the corrresponding sha1 encoded object and asking to
display a sha256 hash displayes the corresponding sha256 encoded
object.

This is useful for testing the conversion of an object to an
equivlanet object encoded with a different hash function.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 builtin/ls-tree.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/builtin/ls-tree.c b/builtin/ls-tree.c
index f558db5f3b80..71281ab705b6 100644
--- a/builtin/ls-tree.c
+++ b/builtin/ls-tree.c
@@ -376,6 +376,7 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 		OPT_END()
 	};
 	struct ls_tree_cmdmode_to_fmt *m2f = ls_tree_cmdmode_format;
+	struct object_context obj_context;
 	int ret;
 
 	git_config(git_default_config, NULL);
@@ -407,7 +408,9 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix)
 			ls_tree_usage, ls_tree_options);
 	if (argc < 1)
 		usage_with_options(ls_tree_usage, ls_tree_options);
-	if (repo_get_oid(the_repository, argv[0], &oid))
+	if (get_oid_with_context(the_repository, argv[0],
+				 GET_OID_HASH_ANY, &oid,
+				 &obj_context))
 		die("Not a valid object name %s", argv[0]);
 
 	/*
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 27/30] test-lib: compute the compatibility hash so tests may use it
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
                     ` (25 preceding siblings ...)
  2023-10-02  2:40   ` [PATCH v2 26/30] builtin/ls-tree: let the oid determine the output algorithm Eric W. Biederman
@ 2023-10-02  2:40   ` Eric W. Biederman
  2023-10-02  2:40   ` [PATCH v2 28/30] t1006: rename sha1 to oid Eric W. Biederman
                     ` (4 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  2:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

Inspired-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 t/test-lib-functions.sh | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 2f8868caa171..92b462e2e711 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1599,7 +1599,16 @@ test_set_hash () {
 
 # Detect the hash algorithm in use.
 test_detect_hash () {
-	test_hash_algo="${GIT_TEST_DEFAULT_HASH:-sha1}"
+	case "$GIT_TEST_DEFAULT_HASH" in
+	"sha256")
+	    test_hash_algo=sha256
+	    test_compat_hash_algo=sha1
+	    ;;
+	*)
+	    test_hash_algo=sha1
+	    test_compat_hash_algo=sha256
+	    ;;
+	esac
 }
 
 # Load common hash metadata and common placeholder object IDs for use with
@@ -1651,6 +1660,12 @@ test_oid () {
 	local algo="${test_hash_algo}" &&
 
 	case "$1" in
+	--hash=storage)
+		algo="$test_hash_algo" &&
+		shift;;
+	--hash=compat)
+		algo="$test_compat_hash_algo" &&
+		shift;;
 	--hash=*)
 		algo="${1#--hash=}" &&
 		shift;;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 28/30] t1006: rename sha1 to oid
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
                     ` (26 preceding siblings ...)
  2023-10-02  2:40   ` [PATCH v2 27/30] test-lib: compute the compatibility hash so tests may use it Eric W. Biederman
@ 2023-10-02  2:40   ` Eric W. Biederman
  2023-10-02  2:40   ` [PATCH v2 29/30] t1006: test oid compatibility with cat-file Eric W. Biederman
                     ` (3 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  2:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

Before I extend this test, changing the naming of the relevant
hash from sha1 to oid.  Calling the hash sha1 is incorrect today
as it can be either sha1 or sha256 depending on the value of
GIT_DEFAULT_HASH_FUNCTION when the test is called.

I plan to test sha1 and sha256 simultaneously in the same repository.
Having a name like sha1 will be even more confusing.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 t/t1006-cat-file.sh | 220 ++++++++++++++++++++++----------------------
 1 file changed, 110 insertions(+), 110 deletions(-)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index d73a0be1b9d1..9b018b538950 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -112,65 +112,65 @@ strlen () {
 
 run_tests () {
     type=$1
-    sha1=$2
+    oid=$2
     size=$3
     content=$4
     pretty_content=$5
 
-    batch_output="$sha1 $type $size
+    batch_output="$oid $type $size
 $content"
 
     test_expect_success "$type exists" '
-	git cat-file -e $sha1
+	git cat-file -e $oid
     '
 
     test_expect_success "Type of $type is correct" '
 	echo $type >expect &&
-	git cat-file -t $sha1 >actual &&
+	git cat-file -t $oid >actual &&
 	test_cmp expect actual
     '
 
     test_expect_success "Size of $type is correct" '
 	echo $size >expect &&
-	git cat-file -s $sha1 >actual &&
+	git cat-file -s $oid >actual &&
 	test_cmp expect actual
     '
 
     test_expect_success "Type of $type is correct using --allow-unknown-type" '
 	echo $type >expect &&
-	git cat-file -t --allow-unknown-type $sha1 >actual &&
+	git cat-file -t --allow-unknown-type $oid >actual &&
 	test_cmp expect actual
     '
 
     test_expect_success "Size of $type is correct using --allow-unknown-type" '
 	echo $size >expect &&
-	git cat-file -s --allow-unknown-type $sha1 >actual &&
+	git cat-file -s --allow-unknown-type $oid >actual &&
 	test_cmp expect actual
     '
 
     test -z "$content" ||
     test_expect_success "Content of $type is correct" '
 	echo_without_newline "$content" >expect &&
-	git cat-file $type $sha1 >actual &&
+	git cat-file $type $oid >actual &&
 	test_cmp expect actual
     '
 
     test_expect_success "Pretty content of $type is correct" '
 	echo_without_newline "$pretty_content" >expect &&
-	git cat-file -p $sha1 >actual &&
+	git cat-file -p $oid >actual &&
 	test_cmp expect actual
     '
 
     test -z "$content" ||
     test_expect_success "--batch output of $type is correct" '
 	echo "$batch_output" >expect &&
-	echo $sha1 | git cat-file --batch >actual &&
+	echo $oid | git cat-file --batch >actual &&
 	test_cmp expect actual
     '
 
     test_expect_success "--batch-check output of $type is correct" '
-	echo "$sha1 $type $size" >expect &&
-	echo_without_newline $sha1 | git cat-file --batch-check >actual &&
+	echo "$oid $type $size" >expect &&
+	echo_without_newline $oid | git cat-file --batch-check >actual &&
 	test_cmp expect actual
     '
 
@@ -179,33 +179,33 @@ $content"
 	test -z "$content" ||
 		test_expect_success "--batch-command $opt output of $type content is correct" '
 		echo "$batch_output" >expect &&
-		test_write_lines "contents $sha1" | git cat-file --batch-command $opt >actual &&
+		test_write_lines "contents $oid" | git cat-file --batch-command $opt >actual &&
 		test_cmp expect actual
 	'
 
 	test_expect_success "--batch-command $opt output of $type info is correct" '
-		echo "$sha1 $type $size" >expect &&
-		test_write_lines "info $sha1" |
+		echo "$oid $type $size" >expect &&
+		test_write_lines "info $oid" |
 		git cat-file --batch-command $opt >actual &&
 		test_cmp expect actual
 	'
     done
 
     test_expect_success "custom --batch-check format" '
-	echo "$type $sha1" >expect &&
-	echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual &&
+	echo "$type $oid" >expect &&
+	echo $oid | git cat-file --batch-check="%(objecttype) %(objectname)" >actual &&
 	test_cmp expect actual
     '
 
     test_expect_success "custom --batch-command format" '
-	echo "$type $sha1" >expect &&
-	echo "info $sha1" | git cat-file --batch-command="%(objecttype) %(objectname)" >actual &&
+	echo "$type $oid" >expect &&
+	echo "info $oid" | git cat-file --batch-command="%(objecttype) %(objectname)" >actual &&
 	test_cmp expect actual
     '
 
     test_expect_success '--batch-check with %(rest)' '
 	echo "$type this is some extra content" >expect &&
-	echo "$sha1    this is some extra content" |
+	echo "$oid    this is some extra content" |
 		git cat-file --batch-check="%(objecttype) %(rest)" >actual &&
 	test_cmp expect actual
     '
@@ -216,7 +216,7 @@ $content"
 		echo "$size" &&
 		echo "$content"
 	} >expect &&
-	echo $sha1 | git cat-file --batch="%(objectsize)" >actual &&
+	echo $oid | git cat-file --batch="%(objectsize)" >actual &&
 	test_cmp expect actual
     '
 
@@ -226,25 +226,25 @@ $content"
 		echo "$type" &&
 		echo "$content"
 	} >expect &&
-	echo $sha1 | git cat-file --batch="%(objecttype)" >actual &&
+	echo $oid | git cat-file --batch="%(objecttype)" >actual &&
 	test_cmp expect actual
     '
 }
 
 hello_content="Hello World"
 hello_size=$(strlen "$hello_content")
-hello_sha1=$(echo_without_newline "$hello_content" | git hash-object --stdin)
+hello_oid=$(echo_without_newline "$hello_content" | git hash-object --stdin)
 
 test_expect_success "setup" '
 	echo_without_newline "$hello_content" > hello &&
 	git update-index --add hello
 '
 
-run_tests 'blob' $hello_sha1 $hello_size "$hello_content" "$hello_content"
+run_tests 'blob' $hello_oid $hello_size "$hello_content" "$hello_content"
 
 test_expect_success '--batch-command --buffer with flush for blob info' '
-	echo "$hello_sha1 blob $hello_size" >expect &&
-	test_write_lines "info $hello_sha1" "flush" |
+	echo "$hello_oid blob $hello_size" >expect &&
+	test_write_lines "info $hello_oid" "flush" |
 	GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=1 \
 	git cat-file --batch-command --buffer >actual &&
 	test_cmp expect actual
@@ -252,38 +252,38 @@ test_expect_success '--batch-command --buffer with flush for blob info' '
 
 test_expect_success '--batch-command --buffer without flush for blob info' '
 	touch output &&
-	test_write_lines "info $hello_sha1" |
+	test_write_lines "info $hello_oid" |
 	GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=1 \
 	git cat-file --batch-command --buffer >>output &&
 	test_must_be_empty output
 '
 
 test_expect_success '--batch-check without %(rest) considers whole line' '
-	echo "$hello_sha1 blob $hello_size" >expect &&
-	git update-index --add --cacheinfo 100644 $hello_sha1 "white space" &&
+	echo "$hello_oid blob $hello_size" >expect &&
+	git update-index --add --cacheinfo 100644 $hello_oid "white space" &&
 	test_when_finished "git update-index --remove \"white space\"" &&
 	echo ":white space" | git cat-file --batch-check >actual &&
 	test_cmp expect actual
 '
 
-tree_sha1=$(git write-tree)
+tree_oid=$(git write-tree)
 tree_size=$(($(test_oid rawsz) + 13))
-tree_pretty_content="100644 blob $hello_sha1	hello${LF}"
+tree_pretty_content="100644 blob $hello_oid	hello${LF}"
 
-run_tests 'tree' $tree_sha1 $tree_size "" "$tree_pretty_content"
+run_tests 'tree' $tree_oid $tree_size "" "$tree_pretty_content"
 
 commit_message="Initial commit"
-commit_sha1=$(echo_without_newline "$commit_message" | git commit-tree $tree_sha1)
+commit_oid=$(echo_without_newline "$commit_message" | git commit-tree $tree_oid)
 commit_size=$(($(test_oid hexsz) + 137))
-commit_content="tree $tree_sha1
+commit_content="tree $tree_oid
 author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> $GIT_AUTHOR_DATE
 committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
 
 $commit_message"
 
-run_tests 'commit' $commit_sha1 $commit_size "$commit_content" "$commit_content"
+run_tests 'commit' $commit_oid $commit_size "$commit_content" "$commit_content"
 
-tag_header_without_timestamp="object $hello_sha1
+tag_header_without_timestamp="object $hello_oid
 type blob
 tag hellotag
 tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL>"
@@ -292,14 +292,14 @@ tag_content="$tag_header_without_timestamp 0 +0000
 
 $tag_description"
 
-tag_sha1=$(echo_without_newline "$tag_content" | git hash-object -t tag --stdin -w)
+tag_oid=$(echo_without_newline "$tag_content" | git hash-object -t tag --stdin -w)
 tag_size=$(strlen "$tag_content")
 
-run_tests 'tag' $tag_sha1 $tag_size "$tag_content" "$tag_content"
+run_tests 'tag' $tag_oid $tag_size "$tag_content" "$tag_content"
 
 test_expect_success "Reach a blob from a tag pointing to it" '
 	echo_without_newline "$hello_content" >expect &&
-	git cat-file blob $tag_sha1 >actual &&
+	git cat-file blob $tag_oid >actual &&
 	test_cmp expect actual
 '
 
@@ -308,31 +308,31 @@ do
     for opt in t s e p
     do
 	test_expect_success "Passing -$opt with --$batch fails" '
-	    test_must_fail git cat-file --$batch -$opt $hello_sha1
+	    test_must_fail git cat-file --$batch -$opt $hello_oid
 	'
 
 	test_expect_success "Passing --$batch with -$opt fails" '
-	    test_must_fail git cat-file -$opt --$batch $hello_sha1
+	    test_must_fail git cat-file -$opt --$batch $hello_oid
 	'
     done
 
     test_expect_success "Passing <type> with --$batch fails" '
-	test_must_fail git cat-file --$batch blob $hello_sha1
+	test_must_fail git cat-file --$batch blob $hello_oid
     '
 
     test_expect_success "Passing --$batch with <type> fails" '
-	test_must_fail git cat-file blob --$batch $hello_sha1
+	test_must_fail git cat-file blob --$batch $hello_oid
     '
 
-    test_expect_success "Passing sha1 with --$batch fails" '
-	test_must_fail git cat-file --$batch $hello_sha1
+    test_expect_success "Passing oid with --$batch fails" '
+	test_must_fail git cat-file --$batch $hello_oid
     '
 done
 
 for opt in t s e p
 do
     test_expect_success "Passing -$opt with --follow-symlinks fails" '
-	    test_must_fail git cat-file --follow-symlinks -$opt $hello_sha1
+	    test_must_fail git cat-file --follow-symlinks -$opt $hello_oid
 	'
 done
 
@@ -360,12 +360,12 @@ test_expect_success "--batch-check for a non-existent hash" '
 
 test_expect_success "--batch for an existent and a non-existent hash" '
 	cat >expect <<-EOF &&
-	$tag_sha1 tag $tag_size
+	$tag_oid tag $tag_size
 	$tag_content
 	0000000000000000000000000000000000000000 missing
 	EOF
 
-	printf "$tag_sha1\n0000000000000000000000000000000000000000" >in &&
+	printf "$tag_oid\n0000000000000000000000000000000000000000" >in &&
 	git cat-file --batch <in >actual &&
 	test_cmp expect actual
 '
@@ -386,74 +386,74 @@ test_expect_success 'empty --batch-check notices missing object' '
 	test_cmp expect actual
 '
 
-batch_input="$hello_sha1
-$commit_sha1
-$tag_sha1
+batch_input="$hello_oid
+$commit_oid
+$tag_oid
 deadbeef
 
 "
 
 printf "%s\0" \
-	"$hello_sha1 blob $hello_size" \
+	"$hello_oid blob $hello_size" \
 	"$hello_content" \
-	"$commit_sha1 commit $commit_size" \
+	"$commit_oid commit $commit_size" \
 	"$commit_content" \
-	"$tag_sha1 tag $tag_size" \
+	"$tag_oid tag $tag_size" \
 	"$tag_content" \
 	"deadbeef missing" \
 	" missing" >batch_output
 
-test_expect_success '--batch with multiple sha1s gives correct format' '
+test_expect_success '--batch with multiple oids gives correct format' '
 	tr "\0" "\n" <batch_output >expect &&
 	echo_without_newline "$batch_input" >in &&
 	git cat-file --batch <in >actual &&
 	test_cmp expect actual
 '
 
-test_expect_success '--batch, -z with multiple sha1s gives correct format' '
+test_expect_success '--batch, -z with multiple oids gives correct format' '
 	echo_without_newline_nul "$batch_input" >in &&
 	tr "\0" "\n" <batch_output >expect &&
 	git cat-file --batch -z <in >actual &&
 	test_cmp expect actual
 '
 
-test_expect_success '--batch, -Z with multiple sha1s gives correct format' '
+test_expect_success '--batch, -Z with multiple oids gives correct format' '
 	echo_without_newline_nul "$batch_input" >in &&
 	git cat-file --batch -Z <in >actual &&
 	test_cmp batch_output actual
 '
 
-batch_check_input="$hello_sha1
-$tree_sha1
-$commit_sha1
-$tag_sha1
+batch_check_input="$hello_oid
+$tree_oid
+$commit_oid
+$tag_oid
 deadbeef
 
 "
 
 printf "%s\0" \
-	"$hello_sha1 blob $hello_size" \
-	"$tree_sha1 tree $tree_size" \
-	"$commit_sha1 commit $commit_size" \
-	"$tag_sha1 tag $tag_size" \
+	"$hello_oid blob $hello_size" \
+	"$tree_oid tree $tree_size" \
+	"$commit_oid commit $commit_size" \
+	"$tag_oid tag $tag_size" \
 	"deadbeef missing" \
 	" missing" >batch_check_output
 
-test_expect_success "--batch-check with multiple sha1s gives correct format" '
+test_expect_success "--batch-check with multiple oids gives correct format" '
 	tr "\0" "\n" <batch_check_output >expect &&
 	echo_without_newline "$batch_check_input" >in &&
 	git cat-file --batch-check <in >actual &&
 	test_cmp expect actual
 '
 
-test_expect_success "--batch-check, -z with multiple sha1s gives correct format" '
+test_expect_success "--batch-check, -z with multiple oids gives correct format" '
 	tr "\0" "\n" <batch_check_output >expect &&
 	echo_without_newline_nul "$batch_check_input" >in &&
 	git cat-file --batch-check -z <in >actual &&
 	test_cmp expect actual
 '
 
-test_expect_success "--batch-check, -Z with multiple sha1s gives correct format" '
+test_expect_success "--batch-check, -Z with multiple oids gives correct format" '
 	echo_without_newline_nul "$batch_check_input" >in &&
 	git cat-file --batch-check -Z <in >actual &&
 	test_cmp batch_check_output actual
@@ -480,18 +480,18 @@ test_expect_success FUNNYNAMES '--batch-check, -Z with newline in input' '
 	test_cmp expect actual
 '
 
-batch_command_multiple_info="info $hello_sha1
-info $tree_sha1
-info $commit_sha1
-info $tag_sha1
+batch_command_multiple_info="info $hello_oid
+info $tree_oid
+info $commit_oid
+info $tag_oid
 info deadbeef"
 
 test_expect_success '--batch-command with multiple info calls gives correct format' '
 	cat >expect <<-EOF &&
-	$hello_sha1 blob $hello_size
-	$tree_sha1 tree $tree_size
-	$commit_sha1 commit $commit_size
-	$tag_sha1 tag $tag_size
+	$hello_oid blob $hello_size
+	$tree_oid tree $tree_size
+	$commit_oid commit $commit_size
+	$tag_oid tag $tag_size
 	deadbeef missing
 	EOF
 
@@ -512,19 +512,19 @@ test_expect_success '--batch-command with multiple info calls gives correct form
 	test_cmp expect_nul actual
 '
 
-batch_command_multiple_contents="contents $hello_sha1
-contents $commit_sha1
-contents $tag_sha1
+batch_command_multiple_contents="contents $hello_oid
+contents $commit_oid
+contents $tag_oid
 contents deadbeef
 flush"
 
 test_expect_success '--batch-command with multiple command calls gives correct format' '
 	printf "%s\0" \
-		"$hello_sha1 blob $hello_size" \
+		"$hello_oid blob $hello_size" \
 		"$hello_content" \
-		"$commit_sha1 commit $commit_size" \
+		"$commit_oid commit $commit_size" \
 		"$commit_content" \
-		"$tag_sha1 tag $tag_size" \
+		"$tag_oid tag $tag_size" \
 		"$tag_content" \
 		"deadbeef missing" >expect_nul &&
 	tr "\0" "\n" <expect_nul >expect &&
@@ -569,7 +569,7 @@ test_expect_success 'confirm that neither loose blob is a delta' '
 # we will check only that one of the two objects is a delta
 # against the other, but not the order. We can do so by just
 # asking for the base of both, and checking whether either
-# sha1 appears in the output.
+# oid appears in the output.
 test_expect_success '%(deltabase) reports packed delta bases' '
 	git repack -ad &&
 	git cat-file --batch-check="%(deltabase)" <blobs >actual &&
@@ -583,12 +583,12 @@ test_expect_success 'setup bogus data' '
 	bogus_short_type="bogus" &&
 	bogus_short_content="bogus" &&
 	bogus_short_size=$(strlen "$bogus_short_content") &&
-	bogus_short_sha1=$(echo_without_newline "$bogus_short_content" | git hash-object -t $bogus_short_type --literally -w --stdin) &&
+	bogus_short_oid=$(echo_without_newline "$bogus_short_content" | git hash-object -t $bogus_short_type --literally -w --stdin) &&
 
 	bogus_long_type="abcdefghijklmnopqrstuvwxyz1234679" &&
 	bogus_long_content="bogus" &&
 	bogus_long_size=$(strlen "$bogus_long_content") &&
-	bogus_long_sha1=$(echo_without_newline "$bogus_long_content" | git hash-object -t $bogus_long_type --literally -w --stdin)
+	bogus_long_oid=$(echo_without_newline "$bogus_long_content" | git hash-object -t $bogus_long_type --literally -w --stdin)
 '
 
 for arg1 in '' --allow-unknown-type
@@ -608,9 +608,9 @@ do
 
 			if test "$arg1" = "--allow-unknown-type"
 			then
-				git cat-file $arg1 $arg2 $bogus_short_sha1
+				git cat-file $arg1 $arg2 $bogus_short_oid
 			else
-				test_must_fail git cat-file $arg1 $arg2 $bogus_short_sha1 >out 2>actual &&
+				test_must_fail git cat-file $arg1 $arg2 $bogus_short_oid >out 2>actual &&
 				test_must_be_empty out &&
 				test_cmp expect actual
 			fi
@@ -620,21 +620,21 @@ do
 			if test "$arg2" = "-p"
 			then
 				cat >expect <<-EOF
-				error: header for $bogus_long_sha1 too long, exceeds 32 bytes
-				fatal: Not a valid object name $bogus_long_sha1
+				error: header for $bogus_long_oid too long, exceeds 32 bytes
+				fatal: Not a valid object name $bogus_long_oid
 				EOF
 			else
 				cat >expect <<-EOF
-				error: header for $bogus_long_sha1 too long, exceeds 32 bytes
+				error: header for $bogus_long_oid too long, exceeds 32 bytes
 				fatal: git cat-file: could not get object info
 				EOF
 			fi &&
 
 			if test "$arg1" = "--allow-unknown-type"
 			then
-				git cat-file $arg1 $arg2 $bogus_short_sha1
+				git cat-file $arg1 $arg2 $bogus_short_oid
 			else
-				test_must_fail git cat-file $arg1 $arg2 $bogus_long_sha1 >out 2>actual &&
+				test_must_fail git cat-file $arg1 $arg2 $bogus_long_oid >out 2>actual &&
 				test_must_be_empty out &&
 				test_cmp expect actual
 			fi
@@ -668,28 +668,28 @@ do
 done
 
 test_expect_success '-e is OK with a broken object without --allow-unknown-type' '
-	git cat-file -e $bogus_short_sha1
+	git cat-file -e $bogus_short_oid
 '
 
 test_expect_success '-e can not be combined with --allow-unknown-type' '
-	test_expect_code 128 git cat-file -e --allow-unknown-type $bogus_short_sha1
+	test_expect_code 128 git cat-file -e --allow-unknown-type $bogus_short_oid
 '
 
 test_expect_success '-p cannot print a broken object even with --allow-unknown-type' '
-	test_must_fail git cat-file -p $bogus_short_sha1 &&
-	test_expect_code 128 git cat-file -p --allow-unknown-type $bogus_short_sha1
+	test_must_fail git cat-file -p $bogus_short_oid &&
+	test_expect_code 128 git cat-file -p --allow-unknown-type $bogus_short_oid
 '
 
 test_expect_success '<type> <hash> does not work with objects of broken types' '
 	cat >err.expect <<-\EOF &&
 	fatal: invalid object type "bogus"
 	EOF
-	test_must_fail git cat-file $bogus_short_type $bogus_short_sha1 2>err.actual &&
+	test_must_fail git cat-file $bogus_short_type $bogus_short_oid 2>err.actual &&
 	test_cmp err.expect err.actual
 '
 
 test_expect_success 'broken types combined with --batch and --batch-check' '
-	echo $bogus_short_sha1 >bogus-oid &&
+	echo $bogus_short_oid >bogus-oid &&
 
 	cat >err.expect <<-\EOF &&
 	fatal: invalid object type
@@ -711,52 +711,52 @@ test_expect_success 'the --allow-unknown-type option does not consider replaceme
 	cat >expect <<-EOF &&
 	$bogus_short_type
 	EOF
-	git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
+	git cat-file -t --allow-unknown-type $bogus_short_oid >actual &&
 	test_cmp expect actual &&
 
 	# Create it manually, as "git replace" will die on bogus
 	# types.
 	head=$(git rev-parse --verify HEAD) &&
-	test_when_finished "test-tool ref-store main delete-refs 0 msg refs/replace/$bogus_short_sha1" &&
-	test-tool ref-store main update-ref msg "refs/replace/$bogus_short_sha1" $head $ZERO_OID REF_SKIP_OID_VERIFICATION &&
+	test_when_finished "test-tool ref-store main delete-refs 0 msg refs/replace/$bogus_short_oid" &&
+	test-tool ref-store main update-ref msg "refs/replace/$bogus_short_oid" $head $ZERO_OID REF_SKIP_OID_VERIFICATION &&
 
 	cat >expect <<-EOF &&
 	commit
 	EOF
-	git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
+	git cat-file -t --allow-unknown-type $bogus_short_oid >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success "Type of broken object is correct" '
 	echo $bogus_short_type >expect &&
-	git cat-file -t --allow-unknown-type $bogus_short_sha1 >actual &&
+	git cat-file -t --allow-unknown-type $bogus_short_oid >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success "Size of broken object is correct" '
 	echo $bogus_short_size >expect &&
-	git cat-file -s --allow-unknown-type $bogus_short_sha1 >actual &&
+	git cat-file -s --allow-unknown-type $bogus_short_oid >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success 'clean up broken object' '
-	rm .git/objects/$(test_oid_to_path $bogus_short_sha1)
+	rm .git/objects/$(test_oid_to_path $bogus_short_oid)
 '
 
 test_expect_success "Type of broken object is correct when type is large" '
 	echo $bogus_long_type >expect &&
-	git cat-file -t --allow-unknown-type $bogus_long_sha1 >actual &&
+	git cat-file -t --allow-unknown-type $bogus_long_oid >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success "Size of large broken object is correct when type is large" '
 	echo $bogus_long_size >expect &&
-	git cat-file -s --allow-unknown-type $bogus_long_sha1 >actual &&
+	git cat-file -s --allow-unknown-type $bogus_long_oid >actual &&
 	test_cmp expect actual
 '
 
 test_expect_success 'clean up broken object' '
-	rm .git/objects/$(test_oid_to_path $bogus_long_sha1)
+	rm .git/objects/$(test_oid_to_path $bogus_long_oid)
 '
 
 test_expect_success 'cat-file -t and -s on corrupt loose object' '
@@ -853,7 +853,7 @@ test_expect_success 'prep for symlink tests' '
 	test_ln_s_add loop2 loop1 &&
 	git add morx dir/subdir/ind2 dir/ind1 &&
 	git commit -am "test" &&
-	echo $hello_sha1 blob $hello_size >found
+	echo $hello_oid blob $hello_size >found
 '
 
 test_expect_success 'git cat-file --batch-check --follow-symlinks works for non-links' '
@@ -941,7 +941,7 @@ test_expect_success 'git cat-file --batch-check --follow-symlinks works for dir/
 	echo HEAD:dirlink/morx >>expect &&
 	echo HEAD:dirlink/morx | git cat-file --batch-check --follow-symlinks >actual &&
 	test_cmp expect actual &&
-	echo $hello_sha1 blob $hello_size >expect &&
+	echo $hello_oid blob $hello_size >expect &&
 	echo HEAD:dirlink/ind1 | git cat-file --batch-check --follow-symlinks >actual &&
 	test_cmp expect actual
 '
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 29/30] t1006: test oid compatibility with cat-file
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
                     ` (27 preceding siblings ...)
  2023-10-02  2:40   ` [PATCH v2 28/30] t1006: rename sha1 to oid Eric W. Biederman
@ 2023-10-02  2:40   ` Eric W. Biederman
  2023-10-02  2:40   ` [PATCH v2 30/30] t1016-compatObjectFormat: add tests to verify the conversion between objects Eric W. Biederman
                     ` (2 subsequent siblings)
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  2:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

Update the existing tests that are oid based to test that cat-file
works correctly with the normal oid and the compat_oid.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 t/t1006-cat-file.sh | 251 +++++++++++++++++++++++++++-----------------
 1 file changed, 154 insertions(+), 97 deletions(-)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 9b018b538950..23d3d37283bb 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -236,27 +236,38 @@ hello_size=$(strlen "$hello_content")
 hello_oid=$(echo_without_newline "$hello_content" | git hash-object --stdin)
 
 test_expect_success "setup" '
+	git config core.repositoryformatversion 1 &&
+	git config extensions.objectformat $test_hash_algo &&
+	git config extensions.compatobjectformat $test_compat_hash_algo &&
 	echo_without_newline "$hello_content" > hello &&
 	git update-index --add hello
 '
 
-run_tests 'blob' $hello_oid $hello_size "$hello_content" "$hello_content"
+run_blob_tests () {
+    oid=$1
 
-test_expect_success '--batch-command --buffer with flush for blob info' '
-	echo "$hello_oid blob $hello_size" >expect &&
-	test_write_lines "info $hello_oid" "flush" |
+    run_tests 'blob' $oid $hello_size "$hello_content" "$hello_content"
+
+    test_expect_success '--batch-command --buffer with flush for blob info' '
+	echo "$oid blob $hello_size" >expect &&
+	test_write_lines "info $oid" "flush" |
 	GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=1 \
 	git cat-file --batch-command --buffer >actual &&
 	test_cmp expect actual
-'
+    '
 
-test_expect_success '--batch-command --buffer without flush for blob info' '
+    test_expect_success '--batch-command --buffer without flush for blob info' '
 	touch output &&
-	test_write_lines "info $hello_oid" |
+	test_write_lines "info $oid" |
 	GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT=1 \
 	git cat-file --batch-command --buffer >>output &&
 	test_must_be_empty output
-'
+    '
+}
+
+hello_compat_oid=$(git rev-parse --output-object-format=$test_compat_hash_algo $hello_oid)
+run_blob_tests $hello_oid
+run_blob_tests $hello_compat_oid
 
 test_expect_success '--batch-check without %(rest) considers whole line' '
 	echo "$hello_oid blob $hello_size" >expect &&
@@ -267,35 +278,58 @@ test_expect_success '--batch-check without %(rest) considers whole line' '
 '
 
 tree_oid=$(git write-tree)
+tree_compat_oid=$(git rev-parse --output-object-format=$test_compat_hash_algo $tree_oid)
 tree_size=$(($(test_oid rawsz) + 13))
+tree_compat_size=$(($(test_oid --hash=compat rawsz) + 13))
 tree_pretty_content="100644 blob $hello_oid	hello${LF}"
+tree_compat_pretty_content="100644 blob $hello_compat_oid	hello${LF}"
 
 run_tests 'tree' $tree_oid $tree_size "" "$tree_pretty_content"
+run_tests 'tree' $tree_compat_oid $tree_compat_size "" "$tree_compat_pretty_content"
 
 commit_message="Initial commit"
 commit_oid=$(echo_without_newline "$commit_message" | git commit-tree $tree_oid)
+commit_compat_oid=$(git rev-parse --output-object-format=$test_compat_hash_algo $commit_oid)
 commit_size=$(($(test_oid hexsz) + 137))
+commit_compat_size=$(($(test_oid --hash=compat hexsz) + 137))
 commit_content="tree $tree_oid
 author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> $GIT_AUTHOR_DATE
 committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
 
 $commit_message"
 
+commit_compat_content="tree $tree_compat_oid
+author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> $GIT_AUTHOR_DATE
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+
+$commit_message"
+
 run_tests 'commit' $commit_oid $commit_size "$commit_content" "$commit_content"
+run_tests 'commit' $commit_compat_oid $commit_compat_size "$commit_compat_content" "$commit_compat_content"
 
-tag_header_without_timestamp="object $hello_oid
-type blob
+tag_header_without_oid="type blob
 tag hellotag
 tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL>"
+tag_header_without_timestamp="object $hello_oid
+$tag_header_without_oid"
+tag_compat_header_without_timestamp="object $hello_compat_oid
+$tag_header_without_oid"
 tag_description="This is a tag"
 tag_content="$tag_header_without_timestamp 0 +0000
 
+$tag_description"
+tag_compat_content="$tag_compat_header_without_timestamp 0 +0000
+
 $tag_description"
 
 tag_oid=$(echo_without_newline "$tag_content" | git hash-object -t tag --stdin -w)
 tag_size=$(strlen "$tag_content")
 
+tag_compat_oid=$(git rev-parse --output-object-format=$test_compat_hash_algo $tag_oid)
+tag_compat_size=$(strlen "$tag_compat_content")
+
 run_tests 'tag' $tag_oid $tag_size "$tag_content" "$tag_content"
+run_tests 'tag' $tag_compat_oid $tag_compat_size "$tag_compat_content" "$tag_compat_content"
 
 test_expect_success "Reach a blob from a tag pointing to it" '
 	echo_without_newline "$hello_content" >expect &&
@@ -303,37 +337,43 @@ test_expect_success "Reach a blob from a tag pointing to it" '
 	test_cmp expect actual
 '
 
-for batch in batch batch-check batch-command
+for oid in $hello_oid $hello_compat_oid
 do
-    for opt in t s e p
+    for batch in batch batch-check batch-command
     do
+	for opt in t s e p
+	do
 	test_expect_success "Passing -$opt with --$batch fails" '
-	    test_must_fail git cat-file --$batch -$opt $hello_oid
+	    test_must_fail git cat-file --$batch -$opt $oid
 	'
 
 	test_expect_success "Passing --$batch with -$opt fails" '
-	    test_must_fail git cat-file -$opt --$batch $hello_oid
+	    test_must_fail git cat-file -$opt --$batch $oid
 	'
-    done
+	done
 
-    test_expect_success "Passing <type> with --$batch fails" '
-	test_must_fail git cat-file --$batch blob $hello_oid
-    '
+	test_expect_success "Passing <type> with --$batch fails" '
+	test_must_fail git cat-file --$batch blob $oid
+	'
 
-    test_expect_success "Passing --$batch with <type> fails" '
-	test_must_fail git cat-file blob --$batch $hello_oid
-    '
+	test_expect_success "Passing --$batch with <type> fails" '
+	test_must_fail git cat-file blob --$batch $oid
+	'
 
-    test_expect_success "Passing oid with --$batch fails" '
-	test_must_fail git cat-file --$batch $hello_oid
-    '
+	test_expect_success "Passing oid with --$batch fails" '
+	test_must_fail git cat-file --$batch $oid
+	'
+    done
 done
 
-for opt in t s e p
+for oid in $hello_oid $hello_compat_oid
 do
-    test_expect_success "Passing -$opt with --follow-symlinks fails" '
-	    test_must_fail git cat-file --follow-symlinks -$opt $hello_oid
+    for opt in t s e p
+    do
+	test_expect_success "Passing -$opt with --follow-symlinks fails" '
+	    test_must_fail git cat-file --follow-symlinks -$opt $oid
 	'
+    done
 done
 
 test_expect_success "--batch-check for a non-existent named object" '
@@ -386,112 +426,102 @@ test_expect_success 'empty --batch-check notices missing object' '
 	test_cmp expect actual
 '
 
-batch_input="$hello_oid
-$commit_oid
-$tag_oid
+batch_tests () {
+    boid=$1
+    loid=$2
+    lsize=$3
+    coid=$4
+    csize=$5
+    ccontent=$6
+    toid=$7
+    tsize=$8
+    tcontent=$9
+
+    batch_input="$boid
+$coid
+$toid
 deadbeef
 
 "
 
-printf "%s\0" \
-	"$hello_oid blob $hello_size" \
+    printf "%s\0" \
+	"$boid blob $hello_size" \
 	"$hello_content" \
-	"$commit_oid commit $commit_size" \
-	"$commit_content" \
-	"$tag_oid tag $tag_size" \
-	"$tag_content" \
+	"$coid commit $csize" \
+	"$ccontent" \
+	"$toid tag $tsize" \
+	"$tcontent" \
 	"deadbeef missing" \
 	" missing" >batch_output
 
-test_expect_success '--batch with multiple oids gives correct format' '
+    test_expect_success '--batch with multiple oids gives correct format' '
 	tr "\0" "\n" <batch_output >expect &&
 	echo_without_newline "$batch_input" >in &&
 	git cat-file --batch <in >actual &&
 	test_cmp expect actual
-'
+    '
 
-test_expect_success '--batch, -z with multiple oids gives correct format' '
+    test_expect_success '--batch, -z with multiple oids gives correct format' '
 	echo_without_newline_nul "$batch_input" >in &&
 	tr "\0" "\n" <batch_output >expect &&
 	git cat-file --batch -z <in >actual &&
 	test_cmp expect actual
-'
+    '
 
-test_expect_success '--batch, -Z with multiple oids gives correct format' '
+    test_expect_success '--batch, -Z with multiple oids gives correct format' '
 	echo_without_newline_nul "$batch_input" >in &&
 	git cat-file --batch -Z <in >actual &&
 	test_cmp batch_output actual
-'
+    '
 
-batch_check_input="$hello_oid
-$tree_oid
-$commit_oid
-$tag_oid
+batch_check_input="$boid
+$loid
+$coid
+$toid
 deadbeef
 
 "
 
-printf "%s\0" \
-	"$hello_oid blob $hello_size" \
-	"$tree_oid tree $tree_size" \
-	"$commit_oid commit $commit_size" \
-	"$tag_oid tag $tag_size" \
+    printf "%s\0" \
+	"$boid blob $hello_size" \
+	"$loid tree $lsize" \
+	"$coid commit $csize" \
+	"$toid tag $tsize" \
 	"deadbeef missing" \
 	" missing" >batch_check_output
 
-test_expect_success "--batch-check with multiple oids gives correct format" '
+    test_expect_success "--batch-check with multiple oids gives correct format" '
 	tr "\0" "\n" <batch_check_output >expect &&
 	echo_without_newline "$batch_check_input" >in &&
 	git cat-file --batch-check <in >actual &&
 	test_cmp expect actual
-'
+    '
 
-test_expect_success "--batch-check, -z with multiple oids gives correct format" '
+    test_expect_success "--batch-check, -z with multiple oids gives correct format" '
 	tr "\0" "\n" <batch_check_output >expect &&
 	echo_without_newline_nul "$batch_check_input" >in &&
 	git cat-file --batch-check -z <in >actual &&
 	test_cmp expect actual
-'
+    '
 
-test_expect_success "--batch-check, -Z with multiple oids gives correct format" '
+    test_expect_success "--batch-check, -Z with multiple oids gives correct format" '
 	echo_without_newline_nul "$batch_check_input" >in &&
 	git cat-file --batch-check -Z <in >actual &&
 	test_cmp batch_check_output actual
-'
-
-test_expect_success FUNNYNAMES 'setup with newline in input' '
-	touch -- "newline${LF}embedded" &&
-	git add -- "newline${LF}embedded" &&
-	git commit -m "file with newline embedded" &&
-	test_tick &&
-
-	printf "HEAD:newline${LF}embedded" >in
-'
-
-test_expect_success FUNNYNAMES '--batch-check, -z with newline in input' '
-	git cat-file --batch-check -z <in >actual &&
-	echo "$(git rev-parse "HEAD:newline${LF}embedded") blob 0" >expect &&
-	test_cmp expect actual
-'
-
-test_expect_success FUNNYNAMES '--batch-check, -Z with newline in input' '
-	git cat-file --batch-check -Z <in >actual &&
-	printf "%s\0" "$(git rev-parse "HEAD:newline${LF}embedded") blob 0" >expect &&
-	test_cmp expect actual
-'
+    '
 
-batch_command_multiple_info="info $hello_oid
-info $tree_oid
-info $commit_oid
-info $tag_oid
+batch_command_multiple_info="info $boid
+info $loid
+info $coid
+info $toid
 info deadbeef"
 
-test_expect_success '--batch-command with multiple info calls gives correct format' '
+    test_expect_success '--batch-command with multiple info calls gives correct format' '
 	cat >expect <<-EOF &&
-	$hello_oid blob $hello_size
-	$tree_oid tree $tree_size
-	$commit_oid commit $commit_size
-	$tag_oid tag $tag_size
+	$boid blob $hello_size
+	$loid tree $lsize
+	$coid commit $csize
+	$toid tag $tsize
 	deadbeef missing
 	EOF
 
@@ -510,22 +540,22 @@ test_expect_success '--batch-command with multiple info calls gives correct form
 	git cat-file --batch-command --buffer -Z <in >actual &&
 
 	test_cmp expect_nul actual
-'
+    '
 
-batch_command_multiple_contents="contents $hello_oid
-contents $commit_oid
-contents $tag_oid
+batch_command_multiple_contents="contents $boid
+contents $coid
+contents $toid
 contents deadbeef
 flush"
 
-test_expect_success '--batch-command with multiple command calls gives correct format' '
+    test_expect_success '--batch-command with multiple command calls gives correct format' '
 	printf "%s\0" \
-		"$hello_oid blob $hello_size" \
+		"$boid blob $hello_size" \
 		"$hello_content" \
-		"$commit_oid commit $commit_size" \
-		"$commit_content" \
-		"$tag_oid tag $tag_size" \
-		"$tag_content" \
+		"$coid commit $csize" \
+		"$ccontent" \
+		"$toid tag $tsize" \
+		"$tcontent" \
 		"deadbeef missing" >expect_nul &&
 	tr "\0" "\n" <expect_nul >expect &&
 
@@ -543,6 +573,33 @@ test_expect_success '--batch-command with multiple command calls gives correct f
 	git cat-file --batch-command --buffer -Z <in >actual &&
 
 	test_cmp expect_nul actual
+    '
+
+}
+
+batch_tests $hello_oid $tree_oid $tree_size $commit_oid $commit_size "$commit_content" $tag_oid $tag_size "$tag_content"
+batch_tests $hello_compat_oid $tree_compat_oid $tree_compat_size $commit_compat_oid $commit_compat_size "$commit_compat_content" $tag_compat_oid $tag_compat_size "$tag_compat_content"
+
+
+test_expect_success FUNNYNAMES 'setup with newline in input' '
+	touch -- "newline${LF}embedded" &&
+	git add -- "newline${LF}embedded" &&
+	git commit -m "file with newline embedded" &&
+	test_tick &&
+
+	printf "HEAD:newline${LF}embedded" >in
+'
+
+test_expect_success FUNNYNAMES '--batch-check, -z with newline in input' '
+	git cat-file --batch-check -z <in >actual &&
+	echo "$(git rev-parse "HEAD:newline${LF}embedded") blob 0" >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success FUNNYNAMES '--batch-check, -Z with newline in input' '
+	git cat-file --batch-check -Z <in >actual &&
+	printf "%s\0" "$(git rev-parse "HEAD:newline${LF}embedded") blob 0" >expect &&
+	test_cmp expect actual
 '
 
 test_expect_success 'setup blobs which are likely to delta' '
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* [PATCH v2 30/30] t1016-compatObjectFormat: add tests to verify the conversion between objects
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
                     ` (28 preceding siblings ...)
  2023-10-02  2:40   ` [PATCH v2 29/30] t1006: test oid compatibility with cat-file Eric W. Biederman
@ 2023-10-02  2:40   ` Eric W. Biederman
  2024-02-07 22:18   ` [PATCH v2 00/30] initial support for multiple hash functions Junio C Hamano
  2024-02-15 11:27   ` Patrick Steinhardt
  31 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2023-10-02  2:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

From: "Eric W. Biederman" <ebiederm@xmission.com>

For now my strategy is simple.  Create two identical repositories one
in each format.  Use fixed timestamps. Verify the dynamically computed
compatibility objects from one repository match the objects stored in
the other repository.

A general limitation of this strategy is that the git when generating
signed tags and commits with compatObjectFormat enabled will generate
a signature for both formats.  To overcome this limitation I have
added "test-tool delete-gpgsig" that when fed an signed commit or tag
with two signatures deletes one of the signatures.

With that in place I can have "git commit" and  "git tag" generate
signed objects, have my tool delete one, and feed the new object
into "git hash-object" to create the kinds of commits and tags
git without compatObjectFormat enabled will generate.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 Makefile                      |   1 +
 t/helper/test-delete-gpgsig.c |  62 ++++++++
 t/helper/test-tool.c          |   1 +
 t/helper/test-tool.h          |   1 +
 t/t1016-compatObjectFormat.sh | 281 ++++++++++++++++++++++++++++++++++
 t/t1016/gpg                   |   2 +
 6 files changed, 348 insertions(+)
 create mode 100644 t/helper/test-delete-gpgsig.c
 create mode 100755 t/t1016-compatObjectFormat.sh
 create mode 100755 t/t1016/gpg

diff --git a/Makefile b/Makefile
index 3c18664def9a..3e4444fb9ab2 100644
--- a/Makefile
+++ b/Makefile
@@ -790,6 +790,7 @@ TEST_BUILTINS_OBJS += test-crontab.o
 TEST_BUILTINS_OBJS += test-csprng.o
 TEST_BUILTINS_OBJS += test-ctype.o
 TEST_BUILTINS_OBJS += test-date.o
+TEST_BUILTINS_OBJS += test-delete-gpgsig.o
 TEST_BUILTINS_OBJS += test-delta.o
 TEST_BUILTINS_OBJS += test-dir-iterator.o
 TEST_BUILTINS_OBJS += test-drop-caches.o
diff --git a/t/helper/test-delete-gpgsig.c b/t/helper/test-delete-gpgsig.c
new file mode 100644
index 000000000000..e36831af03f6
--- /dev/null
+++ b/t/helper/test-delete-gpgsig.c
@@ -0,0 +1,62 @@
+#include "test-tool.h"
+#include "gpg-interface.h"
+#include "strbuf.h"
+
+
+int cmd__delete_gpgsig(int argc, const char **argv)
+{
+	struct strbuf buf = STRBUF_INIT;
+	const char *pattern = "gpgsig";
+	const char *bufptr, *tail, *eol;
+	int deleting = 0;
+	size_t plen;
+
+	if (argc >= 2) {
+		pattern = argv[1];
+		argv++;
+		argc--;
+	}
+
+	plen = strlen(pattern);
+	strbuf_read(&buf, 0, 0);
+
+	if (!strcmp(pattern, "trailer")) {
+		size_t payload_size = parse_signed_buffer(buf.buf, buf.len);
+		fwrite(buf.buf, 1, payload_size, stdout);
+		fflush(stdout);
+		return 0;
+	}
+
+	bufptr = buf.buf;
+	tail = bufptr + buf.len;
+
+	while (bufptr < tail) {
+		/* Find the end of the line */
+		eol = memchr(bufptr, '\n', tail - bufptr);
+		if (!eol)
+			eol = tail;
+
+		/* Drop continuation lines */
+		if (deleting && (bufptr < eol) && (bufptr[0] == ' ')) {
+			bufptr = eol + 1;
+			continue;
+		}
+		deleting = 0;
+
+		/* Does the line match the prefix? */
+		if (((bufptr + plen) < eol) &&
+		    !memcmp(bufptr, pattern, plen) &&
+		    (bufptr[plen] == ' ')) {
+			deleting = 1;
+			bufptr = eol + 1;
+			continue;
+		}
+
+		/* Print all other lines */
+		fwrite(bufptr, 1, (eol - bufptr) + 1, stdout);
+		bufptr = eol + 1;
+	}
+	fflush(stdout);
+
+	return 0;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index abe8a785eb65..8b6c84f202d6 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -21,6 +21,7 @@ static struct test_cmd cmds[] = {
 	{ "csprng", cmd__csprng },
 	{ "ctype", cmd__ctype },
 	{ "date", cmd__date },
+	{ "delete-gpgsig", cmd__delete_gpgsig },
 	{ "delta", cmd__delta },
 	{ "dir-iterator", cmd__dir_iterator },
 	{ "drop-caches", cmd__drop_caches },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index ea2672436c9a..76baaece35b9 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -15,6 +15,7 @@ int cmd__csprng(int argc, const char **argv);
 int cmd__ctype(int argc, const char **argv);
 int cmd__date(int argc, const char **argv);
 int cmd__delta(int argc, const char **argv);
+int cmd__delete_gpgsig(int argc, const char **argv);
 int cmd__dir_iterator(int argc, const char **argv);
 int cmd__drop_caches(int argc, const char **argv);
 int cmd__dump_cache_tree(int argc, const char **argv);
diff --git a/t/t1016-compatObjectFormat.sh b/t/t1016-compatObjectFormat.sh
new file mode 100755
index 000000000000..8132cd37b8c8
--- /dev/null
+++ b/t/t1016-compatObjectFormat.sh
@@ -0,0 +1,281 @@
+#!/bin/sh
+#
+# Copyright (c) 2023 Eric Biederman
+#
+
+test_description='Test how well compatObjectFormat works'
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-gpg.sh
+
+# All of the follow variables must be defined in the environment:
+# GIT_AUTHOR_NAME
+# GIT_AUTHOR_EMAIL
+# GIT_AUTHOR_DATE
+# GIT_COMMITTER_NAME
+# GIT_COMMITTER_EMAIL
+# GIT_COMMITTER_DATE
+#
+# The test relies on these variables being set so that the two
+# different commits in two different repositories encoded with two
+# different hash functions result in the same content in the commits.
+# This means that when the commit is translated between hash functions
+# the commit is identical to the commit in the other repository.
+
+compat_hash () {
+    case "$1" in
+    "sha1")
+	echo "sha256"
+	;;
+    "sha256")
+	echo "sha1"
+	;;
+    esac
+}
+
+hello_oid () {
+    case "$1" in
+    "sha1")
+	echo "$hello_sha1_oid"
+	;;
+    "sha256")
+	echo "$hello_sha256_oid"
+	;;
+    esac
+}
+
+tree_oid () {
+    case "$1" in
+    "sha1")
+	echo "$tree_sha1_oid"
+	;;
+    "sha256")
+	echo "$tree_sha256_oid"
+	;;
+    esac
+}
+
+commit_oid () {
+    case "$1" in
+    "sha1")
+	echo "$commit_sha1_oid"
+	;;
+    "sha256")
+	echo "$commit_sha256_oid"
+	;;
+    esac
+}
+
+commit2_oid () {
+    case "$1" in
+    "sha1")
+	echo "$commit2_sha1_oid"
+	;;
+    "sha256")
+	echo "$commit2_sha256_oid"
+	;;
+    esac
+}
+
+del_sigcommit () {
+    local delete=$1
+
+    if test "$delete" = "sha256" ; then
+	local pattern="gpgsig-sha256"
+    else
+	local pattern="gpgsig"
+    fi
+    test-tool delete-gpgsig "$pattern"
+}
+
+
+del_sigtag () {
+    local storage=$1
+    local delete=$2
+
+    if test "$storage" = "$delete" ; then
+	local pattern="trailer"
+    elif test "$storage" = "sha256" ; then
+	local pattern="gpgsig"
+    else
+	local pattern="gpgsig-sha256"
+    fi
+    test-tool delete-gpgsig "$pattern"
+}
+
+base=$(pwd)
+for hash in sha1 sha256
+do
+	cd "$base"
+	mkdir -p repo-$hash
+	cd repo-$hash
+
+	test_expect_success "setup $hash repository" '
+		git init --object-format=$hash &&
+		git config core.repositoryformatversion 1 &&
+		git config extensions.objectformat $hash &&
+		git config extensions.compatobjectformat $(compat_hash $hash) &&
+		git config gpg.program $TEST_DIRECTORY/t1016/gpg &&
+		echo "Hellow World!" > hello &&
+		eval hello_${hash}_oid=$(git hash-object hello) &&
+		git update-index --add hello &&
+		git commit -m "Initial commit" &&
+		eval commit_${hash}_oid=$(git rev-parse HEAD) &&
+		eval tree_${hash}_oid=$(git rev-parse HEAD^{tree})
+	'
+	test_expect_success "create a $hash  tagged blob" '
+		git tag --no-sign -m "This is a tag" hellotag $(hello_oid $hash) &&
+		eval hellotag_${hash}_oid=$(git rev-parse hellotag)
+	'
+	test_expect_success "create a $hash tagged tree" '
+		git tag --no-sign -m "This is a tag" treetag $(tree_oid $hash) &&
+		eval treetag_${hash}_oid=$(git rev-parse treetag)
+	'
+	test_expect_success "create a $hash tagged commit" '
+		git tag --no-sign -m "This is a tag" committag $(commit_oid $hash) &&
+		eval committag_${hash}_oid=$(git rev-parse committag)
+	'
+	test_expect_success GPG2 "create a $hash signed commit" '
+		git commit --gpg-sign --allow-empty -m "This is a signed commit" &&
+		eval signedcommit_${hash}_oid=$(git rev-parse HEAD)
+	'
+	test_expect_success GPG2 "create a $hash signed tag" '
+		git tag -s -m "This is a signed tag" signedtag HEAD &&
+		eval signedtag_${hash}_oid=$(git rev-parse signedtag)
+	'
+	test_expect_success "create a $hash branch" '
+		git checkout -b branch $(commit_oid $hash) &&
+		echo "More more more give me more!" > more &&
+		eval more_${hash}_oid=$(git hash-object more) &&
+		echo "Another and another and another" > another &&
+		eval another_${hash}_oid=$(git hash-object another) &&
+		git update-index --add more another &&
+		git commit -m "Add more files!" &&
+		eval commit2_${hash}_oid=$(git rev-parse HEAD) &&
+		eval tree2_${hash}_oid=$(git rev-parse HEAD^{tree})
+	'
+	test_expect_success GPG2 "create another $hash signed tag" '
+		git tag -s -m "This is another signed tag" signedtag2 $(commit2_oid $hash) &&
+		eval signedtag2_${hash}_oid=$(git rev-parse signedtag2)
+	'
+	test_expect_success GPG2 "merge the $hash branches together" '
+		git merge -S -m "merge some signed tags together" signedtag signedtag2 &&
+		eval signedcommit2_${hash}_oid=$(git rev-parse HEAD)
+	'
+	test_expect_success GPG2 "create additional $hash signed commits" '
+		git commit --gpg-sign --allow-empty -m "This is an additional signed commit" &&
+		git cat-file commit HEAD | del_sigcommit sha256 > "../${hash}_signedcommit3" &&
+		git cat-file commit HEAD | del_sigcommit sha1 > "../${hash}_signedcommit4" &&
+		eval signedcommit3_${hash}_oid=$(git hash-object -t commit -w ../${hash}_signedcommit3) &&
+		eval signedcommit4_${hash}_oid=$(git hash-object -t commit -w ../${hash}_signedcommit4)
+	'
+	test_expect_success GPG2 "create additional $hash signed tags" '
+		git tag -s -m "This is an additional signed tag" signedtag34 HEAD &&
+		git cat-file tag signedtag34 | del_sigtag "${hash}" sha256 > ../${hash}_signedtag3 &&
+		git cat-file tag signedtag34 | del_sigtag "${hash}" sha1 > ../${hash}_signedtag4 &&
+		eval signedtag3_${hash}_oid=$(git hash-object -t tag -w ../${hash}_signedtag3) &&
+		eval signedtag4_${hash}_oid=$(git hash-object -t tag -w ../${hash}_signedtag4)
+	'
+done
+cd "$base"
+
+compare_oids () {
+    test "$#" = 5 && { local PREREQ=$1; shift; } || PREREQ=
+    local type="$1"
+    local name="$2"
+    local sha1_oid="$3"
+    local sha256_oid="$4"
+
+    echo ${sha1_oid} > ${name}_sha1_expected
+    echo ${sha256_oid} > ${name}_sha256_expected
+    echo ${type} > ${name}_type_expected
+
+    git --git-dir=repo-sha1/.git rev-parse --output-object-format=sha256 ${sha1_oid} > ${name}_sha1_sha256_found
+    git --git-dir=repo-sha256/.git rev-parse --output-object-format=sha1 ${sha256_oid} > ${name}_sha256_sha1_found
+    local sha1_sha256_oid=$(cat ${name}_sha1_sha256_found)
+    local sha256_sha1_oid=$(cat ${name}_sha256_sha1_found)
+
+    test_expect_success $PREREQ "Verify ${type} ${name}'s sha1 oid" '
+	git --git-dir=repo-sha256/.git rev-parse --output-object-format=sha1 ${sha256_oid} > ${name}_sha1 &&
+	test_cmp ${name}_sha1 ${name}_sha1_expected
+'
+
+    test_expect_success $PREREQ "Verify ${type} ${name}'s sha256 oid" '
+	git --git-dir=repo-sha1/.git rev-parse --output-object-format=sha256 ${sha1_oid} > ${name}_sha256 &&
+	test_cmp ${name}_sha256 ${name}_sha256_expected
+'
+
+    test_expect_success $PREREQ "Verify ${name}'s sha1 type" '
+	git --git-dir=repo-sha1/.git cat-file -t ${sha1_oid} > ${name}_type1 &&
+	git --git-dir=repo-sha256/.git cat-file -t ${sha256_sha1_oid} > ${name}_type2 &&
+	test_cmp ${name}_type1 ${name}_type2 &&
+	test_cmp ${name}_type1 ${name}_type_expected
+'
+
+    test_expect_success $PREREQ "Verify ${name}'s sha256 type" '
+	git --git-dir=repo-sha256/.git cat-file -t ${sha256_oid} > ${name}_type3 &&
+	git --git-dir=repo-sha1/.git cat-file -t ${sha1_sha256_oid} > ${name}_type4 &&
+	test_cmp ${name}_type3 ${name}_type4 &&
+	test_cmp ${name}_type3 ${name}_type_expected
+'
+
+    test_expect_success $PREREQ "Verify ${name}'s sha1 size" '
+	git --git-dir=repo-sha1/.git cat-file -s ${sha1_oid} > ${name}_size1 &&
+	git --git-dir=repo-sha256/.git cat-file -s ${sha256_sha1_oid} > ${name}_size2 &&
+	test_cmp ${name}_size1 ${name}_size2
+'
+
+    test_expect_success $PREREQ "Verify ${name}'s sha256 size" '
+	git --git-dir=repo-sha256/.git cat-file -s ${sha256_oid} > ${name}_size3 &&
+	git --git-dir=repo-sha1/.git cat-file -s ${sha1_sha256_oid} > ${name}_size4 &&
+	test_cmp ${name}_size3 ${name}_size4
+'
+
+    test_expect_success $PREREQ "Verify ${name}'s sha1 pretty content" '
+	git --git-dir=repo-sha1/.git cat-file -p ${sha1_oid} > ${name}_content1 &&
+	git --git-dir=repo-sha256/.git cat-file -p ${sha256_sha1_oid} > ${name}_content2 &&
+	test_cmp ${name}_content1 ${name}_content2
+'
+
+    test_expect_success $PREREQ "Verify ${name}'s sha256 pretty content" '
+	git --git-dir=repo-sha256/.git cat-file -p ${sha256_oid} > ${name}_content3 &&
+	git --git-dir=repo-sha1/.git cat-file -p ${sha1_sha256_oid} > ${name}_content4 &&
+	test_cmp ${name}_content3 ${name}_content4
+'
+
+    test_expect_success $PREREQ "Verify ${name}'s sha1 content" '
+	git --git-dir=repo-sha1/.git cat-file ${type} ${sha1_oid} > ${name}_content5 &&
+	git --git-dir=repo-sha256/.git cat-file ${type} ${sha256_sha1_oid} > ${name}_content6 &&
+	test_cmp ${name}_content5 ${name}_content6
+'
+
+    test_expect_success $PREREQ "Verify ${name}'s sha256 content" '
+	git --git-dir=repo-sha256/.git cat-file ${type} ${sha256_oid} > ${name}_content7 &&
+	git --git-dir=repo-sha1/.git cat-file ${type} ${sha1_sha256_oid} > ${name}_content8 &&
+	test_cmp ${name}_content7 ${name}_content8
+'
+
+}
+
+compare_oids 'blob' hello "$hello_sha1_oid" "$hello_sha256_oid"
+compare_oids 'tree' tree "$tree_sha1_oid" "$tree_sha256_oid"
+compare_oids 'commit' commit "$commit_sha1_oid" "$commit_sha256_oid"
+compare_oids GPG2 'commit' signedcommit "$signedcommit_sha1_oid" "$signedcommit_sha256_oid"
+compare_oids 'tag' hellotag "$hellotag_sha1_oid" "$hellotag_sha256_oid"
+compare_oids 'tag' treetag "$treetag_sha1_oid" "$treetag_sha256_oid"
+compare_oids 'tag' committag "$committag_sha1_oid" "$committag_sha256_oid"
+compare_oids GPG2 'tag' signedtag "$signedtag_sha1_oid" "$signedtag_sha256_oid"
+
+compare_oids 'blob' more "$more_sha1_oid" "$more_sha256_oid"
+compare_oids 'blob' another "$another_sha1_oid" "$another_sha256_oid"
+compare_oids 'tree' tree2 "$tree2_sha1_oid" "$tree2_sha256_oid"
+compare_oids 'commit' commit2 "$commit2_sha1_oid" "$commit2_sha256_oid"
+compare_oids GPG2 'tag' signedtag2 "$signedtag2_sha1_oid" "$signedtag2_sha256_oid"
+compare_oids GPG2 'commit' signedcommit2 "$signedcommit2_sha1_oid" "$signedcommit2_sha256_oid"
+compare_oids GPG2 'commit' signedcommit3 "$signedcommit3_sha1_oid" "$signedcommit3_sha256_oid"
+compare_oids GPG2 'commit' signedcommit4 "$signedcommit4_sha1_oid" "$signedcommit4_sha256_oid"
+compare_oids GPG2 'tag' signedtag3 "$signedtag3_sha1_oid" "$signedtag3_sha256_oid"
+compare_oids GPG2 'tag' signedtag4 "$signedtag4_sha1_oid" "$signedtag4_sha256_oid"
+
+test_done
diff --git a/t/t1016/gpg b/t/t1016/gpg
new file mode 100755
index 000000000000..2601cb18a5b3
--- /dev/null
+++ b/t/t1016/gpg
@@ -0,0 +1,2 @@
+#!/bin/sh
+exec gpg --faked-system-time "20230918T154812" "$@"
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 00/30] initial support for multiple hash functions
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
                     ` (29 preceding siblings ...)
  2023-10-02  2:40   ` [PATCH v2 30/30] t1016-compatObjectFormat: add tests to verify the conversion between objects Eric W. Biederman
@ 2024-02-07 22:18   ` Junio C Hamano
  2024-02-08  0:24     ` Linus Arver
  2024-02-15 11:27   ` Patrick Steinhardt
  31 siblings, 1 reply; 104+ messages in thread
From: Junio C Hamano @ 2024-02-07 22:18 UTC (permalink / raw)
  To: git; +Cc: Eric W. Biederman, brian m. carlson, Eric Sunshine,
	Eric W. Biederman

"Eric W. Biederman" <ebiederm@gmail.com> writes:

> This addresses all of the known test failures from v1 of this set of
> changes.  In particular I have reworked commit_tree_extended which
> was flagged by smatch, -Werror=array-bounds, and the leak detector.
>
> One functional bug was fixed in repo_for_each_abbrev where it was
> mistakenly displaying too many ambiguous oids.
>
> I am posting this so that people review and testing of this patchset
> won't be distracted by the known and fixed issues.

We haven't seen any reviews on this second round, and have had it
outside 'next' for too long.  I am tempted to say that we merge it
to 'next' and see if anybody screams at this point.

Thanks.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 00/30] initial support for multiple hash functions
  2024-02-07 22:18   ` [PATCH v2 00/30] initial support for multiple hash functions Junio C Hamano
@ 2024-02-08  0:24     ` Linus Arver
  2024-02-08  6:11       ` Patrick Steinhardt
  2024-02-14  7:36       ` Linus Arver
  0 siblings, 2 replies; 104+ messages in thread
From: Linus Arver @ 2024-02-08  0:24 UTC (permalink / raw)
  To: Junio C Hamano, git
  Cc: Eric W. Biederman, brian m. carlson, Eric Sunshine,
	Eric W. Biederman

Junio C Hamano <gitster@pobox.com> writes:

> "Eric W. Biederman" <ebiederm@gmail.com> writes:
>
>> This addresses all of the known test failures from v1 of this set of
>> changes.  In particular I have reworked commit_tree_extended which
>> was flagged by smatch, -Werror=array-bounds, and the leak detector.
>>
>> One functional bug was fixed in repo_for_each_abbrev where it was
>> mistakenly displaying too many ambiguous oids.
>>
>> I am posting this so that people review and testing of this patchset
>> won't be distracted by the known and fixed issues.
>
> We haven't seen any reviews on this second round, and have had it
> outside 'next' for too long.  I am tempted to say that we merge it
> to 'next' and see if anybody screams at this point.

FWIW out of all the "Needs review" topics this one seemed like the most
deserving of another pair of eyes, and I was planning to review some of
the patches here this week + the weekend. If my review takes too long
(taking longer than this weekend) I can give another update next week
saying "too hard for me, please don't wait for me" to unblock you from
merging to next.

Thanks.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 00/30] initial support for multiple hash functions
  2024-02-08  0:24     ` Linus Arver
@ 2024-02-08  6:11       ` Patrick Steinhardt
  2024-02-14  7:36       ` Linus Arver
  1 sibling, 0 replies; 104+ messages in thread
From: Patrick Steinhardt @ 2024-02-08  6:11 UTC (permalink / raw)
  To: Linus Arver
  Cc: Junio C Hamano, git, Eric W. Biederman, brian m. carlson,
	Eric Sunshine, Eric W. Biederman

[-- Attachment #1: Type: text/plain, Size: 1423 bytes --]

On Wed, Feb 07, 2024 at 04:24:13PM -0800, Linus Arver wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > "Eric W. Biederman" <ebiederm@gmail.com> writes:
> >
> >> This addresses all of the known test failures from v1 of this set of
> >> changes.  In particular I have reworked commit_tree_extended which
> >> was flagged by smatch, -Werror=array-bounds, and the leak detector.
> >>
> >> One functional bug was fixed in repo_for_each_abbrev where it was
> >> mistakenly displaying too many ambiguous oids.
> >>
> >> I am posting this so that people review and testing of this patchset
> >> won't be distracted by the known and fixed issues.
> >
> > We haven't seen any reviews on this second round, and have had it
> > outside 'next' for too long.  I am tempted to say that we merge it
> > to 'next' and see if anybody screams at this point.
> 
> FWIW out of all the "Needs review" topics this one seemed like the most
> deserving of another pair of eyes, and I was planning to review some of
> the patches here this week + the weekend. If my review takes too long
> (taking longer than this weekend) I can give another update next week
> saying "too hard for me, please don't wait for me" to unblock you from
> merging to next.

I completely lost track of this patch series. So same for me: I don't
want to hold it up, but would be happy to give it a pair of eyes next
week.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 01/30] object-file-convert: stubs for converting from one object format to another
  2023-10-02  2:40   ` [PATCH v2 01/30] object-file-convert: stubs for converting from one object format to another Eric W. Biederman
@ 2024-02-08  8:23     ` Linus Arver
  2024-02-15 11:21     ` Patrick Steinhardt
  1 sibling, 0 replies; 104+ messages in thread
From: Linus Arver @ 2024-02-08  8:23 UTC (permalink / raw)
  To: Eric W. Biederman, Junio C Hamano
  Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

"Eric W. Biederman" <ebiederm@gmail.com> writes:

> From: "Eric W. Biederman" <ebiederm@xmission.com>
>
> Two basic functions are provided:
> - convert_object_file Takes an object file it's type and hash algorithm
>   and converts it into the equivalent object file that would
>   have been generated with hash algorithm "to".

Should probably be

    - convert_object_file takes an object file, its type, and hash algorithm
      and converts it into the equivalent object file using hash
      algorithm "to".

It would be nice if you gave the function name some clue of what sort of
conversion is being done though. Something like
"convert_object_file_hash" or "switch_object_file_hash". I like "switch"
because the word "hash" itself means to convert some input bytes into
another set of (aka "hashed") bytes, and the indirect shadowing of
"convert" in this way can be avoided here by using a different word.The
whole point of this function is to switch the hashing scheme from one to
another, while still keeping everything else the same, so "switch" seems
more appropriate.

Unless, of course, we already pervasively use "convert" this way
elsewhere in the codebase (I have not checked).

>   For blob objects there is no conversation to be done and it is an

s/conversation/conversion

>   error to use this function on them.

Could you explain why no conversion is needed for blob objects, and also
why it should be an error (and not just a NOP)?

In the code we can also call BUG() if the from/to algos are the same.
It's probably worth mentioning in here as well?

Also for such detailed explanations, I think it's much better
to place them as comments directly above the function (and only mention
the important bits about these helper functions, other than the fact
that they will come in handy in later patches, in the log message).

>   For commit, tree, and tag objects embedded oids are replaced by the
>   oids of the objects they refer to with those objects and their
>   object ids reencoded in with the hash algorithm "to".

That's a little wordy. I assume embedded oids just mean oids that these
objects refer to (e.g., commit objects have oids, but for example the
tree referred by a commit would be an example of an embedded oid). If
so, then how about just

    For commit, tree, and tag objects both their oids and embedded
    (dependent) oids are converted using hash algorithm "to".

I dropped "reencoded" in favor of "converted" btecause that's the verb
you use in your function name "convert_object_file()".

>   Signatures
>   are rearranged so that they remain valid after the object has
>   been reencoded.

Maybe s/reencoded/converted here as well? But also, it sounds odd to me
that signatures are simply "rearranged" and not "regenerated" or
"recreated" becaues "rearranged" means keeping most of the old stuff
around but just repositioning them, which doesn't sound like it's doing
justice to the meaning of a hash algo transition.

> - repo_oid_to_algop which takes an oid that refers to an object file
>   and returns the oid of the equivalent object file generated
>   with the target hash algorithm.

The name is odd to me because "repo_oid" doesn't make sense (a repo
is not an object so it doesn't have an oid), but also the "to_algop"
name (AFAICS "algop" just means "pointer to algorithm" in the codebase,
for examle in <hash-ll.h>).

Here's a possible rewording:

    - switch_oid_hash takes an object file's oid and returns a new one
      using hash algorithm "to".

> The pair of files object-file-convert.c and object-file-convert.h are
> introduced to hold as much of this logic as possible to keep this
> conversion logic cleanly separated from everything else and in the
> hopes that someday the code will be clean enough git can support

Did you mean "clean enough so that Git ..."?

> compiling out support for sha1 and the various conversion functions.

FYI you can cut down this ambitious sentence into two for readability,
like this:

    The new files object-file-convert.{c,h} hold as much of this logic
    as possible to keep this conversion logic cleanly separated from
    everything else. This separation will help us easily phase out SHA1
    support (perhaps as a compile-time flag) in the future.

> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  Makefile              |  1 +
>  object-file-convert.c | 57 +++++++++++++++++++++++++++++++++++++++++++
>  object-file-convert.h | 24 ++++++++++++++++++
>  3 files changed, 82 insertions(+)
>  create mode 100644 object-file-convert.c
>  create mode 100644 object-file-convert.h
>
> diff --git a/Makefile b/Makefile
> index 577630936535..f7e824f25cda 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1073,6 +1073,7 @@ LIB_OBJS += notes-cache.o
>  LIB_OBJS += notes-merge.o
>  LIB_OBJS += notes-utils.o
>  LIB_OBJS += notes.o
> +LIB_OBJS += object-file-convert.o
>  LIB_OBJS += object-file.o
>  LIB_OBJS += object-name.o
>  LIB_OBJS += object.o
> diff --git a/object-file-convert.c b/object-file-convert.c
> new file mode 100644
> index 000000000000..4777aba83636
> --- /dev/null
> +++ b/object-file-convert.c
> @@ -0,0 +1,57 @@
> +#include "git-compat-util.h"
> +#include "gettext.h"
> +#include "strbuf.h"
> +#include "repository.h"
> +#include "hash-ll.h"
> +#include "object.h"
> +#include "object-file-convert.h"
> +
> +int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
> +		      const struct git_hash_algo *to, struct object_id *dest)
> +{
> +	/*
> +	 * If the source algorithm is not set, then we're using the
> +	 * default hash algorithm for that object.
> +	 */
> +	const struct git_hash_algo *from =
> +		src->algo ? &hash_algos[src->algo] : repo->hash_algo;

Hmm, if we're only using "repo" to grab its hash_algo member as a
fallback, should this function be broken down into 2, one to get the
fallback algo and another that has the meat of this function without the
"repo" parameter?

I haven't read the rest of the series yet, let me keep reading.

> +
> +	if (from == to) {
> +		if (src != dest)
> +			oidcpy(dest, src);
> +		return 0;
> +	}
> +	return -1;
> +}

It's curious to me that you treat same-hash-algo as a NOP here, but as a
BUG() in the other helper. Why the difference (perhaps add a comment)?

> +int convert_object_file(struct strbuf *outbuf,
> +			const struct git_hash_algo *from,
> +			const struct git_hash_algo *to,
> +			const void *buf, size_t len,
> +			enum object_type type,
> +			int gentle)

Is there a particular reason you chose to put the out parameter (outbuf)
at the beginning, rather than at the end? In the other helper function
the "dest" out param comes at the end. Style conflict...?

Also, you don't use buf or len, so you could have kept them out from
this patch (so that you only add them later as needed).

> +{
> +	int ret;
> +
> +	/* Don't call this function when no conversion is necessary */

Please avoid double-negation. How about simply

    /* Refuse nonsensical conversion */

or simply drop the comment (as the BUG() description already serves the
same purpose?)

> +	if ((from == to) || (type == OBJ_BLOB))
> +		BUG("Refusing noop object file conversion");

I think it would be better if you separated these out and gave them
different BUG() messages. If we barf with a BUG() it would be so much
more helpful if the message we get is as specific as possible, rather
than leaving us guessing whether (as in this case) we passed in
identical hash algos or whether we tried to handle a blob object. Since
you already have the switch statement below, the OBJ_BLOB case could go
there easily enough.

Nit: I think BUG() messages are not supposed to be capitalized.

> +	switch (type) {
> +	case OBJ_COMMIT:
> +	case OBJ_TREE:
> +	case OBJ_TAG:
> +	default:
> +		/* Not implemented yet, so fail. */
> +		ret = -1;
> +		break;
> +	}
> +	if (!ret)
> +		return 0;
> +	if (gentle) {
> +		strbuf_release(outbuf);

What does "gentle" mean here? But also if you are just freeing the
outbuf, then why not just name this param "free_outbuf"? But also it
seems a bit odd that the caller (who presumably owns the outbuf) is
asking a conversion function to possibly free it before returning.

> +		return ret;
> +	}
> +	die(_("Failed to convert object from %s to %s"),
> +		from->name, to->name);
> +}
> diff --git a/object-file-convert.h b/object-file-convert.h
> new file mode 100644
> index 000000000000..a4f802aa8eea
> --- /dev/null
> +++ b/object-file-convert.h
> @@ -0,0 +1,24 @@
> +#ifndef OBJECT_CONVERT_H
> +#define OBJECT_CONVERT_H
> +
> +struct repository;
> +struct object_id;
> +struct git_hash_algo;
> +struct strbuf;
> +#include "object.h"

It looks a bit odd that the forward declarations of these structs come
before the #include line. I don't think that's the pattern in our
codebase (maybe I'm wrong?).

In hindsight, the log message could have added that switching back
and forth between different hash algos is a fundamental (but currently
missing) operation, and that this is the reason why these conversion
functions (currently unfinished) are necessary as the first step for the
patch series. I realize that you've stated as much in your cover letter,
but it is always nice to have the intent embedded in the log message(s)
where applicable (such as the case in this patch that introduces a brand
new header file) to save future developers the hassle of looking up the
relevant cover letter.

Thanks.

> +int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
> +		      const struct git_hash_algo *to, struct object_id *dest);
> +
> +/*
> + * Convert an object file from one hash algorithm to another algorithm.
> + * Return -1 on failure, 0 on success.
> + */
> +int convert_object_file(struct strbuf *outbuf,
> +			const struct git_hash_algo *from,
> +			const struct git_hash_algo *to,
> +			const void *buf, size_t len,
> +			enum object_type type,
> +			int gentle);
> +
> +#endif /* OBJECT_CONVERT_H */
> -- 
> 2.41.0

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 22/30] rev-parse: add an --output-object-format parameter
  2023-10-02  2:40   ` [PATCH v2 22/30] rev-parse: add an --output-object-format parameter Eric W. Biederman
@ 2024-02-08 16:25     ` Jean-Noël Avila
  0 siblings, 0 replies; 104+ messages in thread
From: Jean-Noël Avila @ 2024-02-08 16:25 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: git

Le 02/10/2023 à 04:40, Eric W. Biederman a écrit :
> +				else die(_("unsupported object format: %s"), arg);

"unsupported object format '%s'" already exist in translated strings.

One less similar string to translate.


Thank you.

> +			}
>   			if (opt_with_value(arg, "--short", &arg)) {
>   				filter &= ~(DO_FLAGS|DO_NOREV);




^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 02/30] oid-array: teach oid-array to handle multiple kinds of oids
  2023-10-02  2:40   ` [PATCH v2 02/30] oid-array: teach oid-array to handle multiple kinds of oids Eric W. Biederman
@ 2024-02-13  8:16     ` Linus Arver
  2024-02-15  6:22       ` Eric W. Biederman
  2024-02-13  8:31     ` Kristoffer Haugsbakk
  2024-02-15 11:21     ` Patrick Steinhardt
  2 siblings, 1 reply; 104+ messages in thread
From: Linus Arver @ 2024-02-13  8:16 UTC (permalink / raw)
  To: Eric W. Biederman, Junio C Hamano
  Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

"Eric W. Biederman" <ebiederm@gmail.com> writes:

> From: "Eric W. Biederman" <ebiederm@xmission.com>
>
> While looking at how to handle input of both SHA-1 and SHA-256 oids in
> get_oid_with_context, I realized that the oid_array in
> repo_for_each_abbrev might have more than one kind of oid stored in it
> simultaneously.
>
> Update to oid_array_append to ensure that oids added to an oid array

s/Update to/Update

> always have an algorithm set.
>
> Update void_hashcmp to first verify two oids use the same hash algorithm
> before comparing them to each other.
>
> With that oid-array should be safe to use with different kinds of

s/oid-array/oid_array

> oids simultaneously.
>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  oid-array.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/oid-array.c b/oid-array.c
> index 8e4717746c31..1f36651754ed 100644
> --- a/oid-array.c
> +++ b/oid-array.c
> @@ -6,12 +6,20 @@ void oid_array_append(struct oid_array *array, const struct object_id *oid)
>  {
>  	ALLOC_GROW(array->oid, array->nr + 1, array->alloc);
>  	oidcpy(&array->oid[array->nr++], oid);
> +	if (!oid->algo)
> +		oid_set_algo(&array->oid[array->nr - 1], the_hash_algo);

How come we can't set oid->algo _before_ we call oidcpy()? It seems odd
that we do the copy first and then modify what we just copied after the
fact, instead of making sure that the thing we want to copy is correct
before doing the copy.

But also, if we are going to make the oid object "correct" before
invoking oidcpy(), we might as well do it when the oid is first
created/used (in the caller(s) of this function). I don't demand that
you find/demonstrate where all these places are in this series (maybe
that's a hairy problem to tackle?), but it seems cleaner in principle to
fix the creation of oid objects instead of having to make oid users
clean up their act like this after using them.

>  	array->sorted = 0;
>  }
>  
> -static int void_hashcmp(const void *a, const void *b)
> +static int void_hashcmp(const void *va, const void *vb)
>  {
> -	return oidcmp(a, b);
> +	const struct object_id *a = va, *b = vb;
> +	int ret;
> +	if (a->algo == b->algo)
> +		ret = oidcmp(a, b);

This makes sense (per the commit message description) ...

> +	else
> +		ret = a->algo > b->algo ? 1 : -1;

... but this seems to go against it? I thought you wanted to only ever
compare hashes if they were of the same algo? It would be good to add a
comment explaining why this is OK (we are no longer doing a byte-by-byte
comparison of these oids any more here like we do for oidcmp() above
which boils down to calling memcmp()).

> +	return ret;

Also, in terms of style I think the "early return for errors" style
would be simpler to read. I.e.

    if (a->algo > b->algo)
        return 1;

    if (a->algo < b->algo)
        return -1;

    return oidcmd(a, b);

>  }
>  
>  void oid_array_sort(struct oid_array *array)
> -- 
> 2.41.0

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 02/30] oid-array: teach oid-array to handle multiple kinds of oids
  2023-10-02  2:40   ` [PATCH v2 02/30] oid-array: teach oid-array to handle multiple kinds of oids Eric W. Biederman
  2024-02-13  8:16     ` Linus Arver
@ 2024-02-13  8:31     ` Kristoffer Haugsbakk
  2024-02-15  6:24       ` Eric W. Biederman
  2024-02-15 11:21     ` Patrick Steinhardt
  2 siblings, 1 reply; 104+ messages in thread
From: Kristoffer Haugsbakk @ 2024-02-13  8:31 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman,
	Junio C Hamano

On Mon, Oct 2, 2023, at 04:40, Eric W. Biederman wrote:
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Most of your patches have this sign-off line with your name quoted.

-- 
Kristoffer Haugsbakk



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 03/30] object-names: support input of oids in any supported hash
  2023-10-02  2:40   ` [PATCH v2 03/30] object-names: support input of oids in any supported hash Eric W. Biederman
@ 2024-02-13  9:33     ` Linus Arver
  2024-02-15 11:21     ` Patrick Steinhardt
  1 sibling, 0 replies; 104+ messages in thread
From: Linus Arver @ 2024-02-13  9:33 UTC (permalink / raw)
  To: Eric W. Biederman, Junio C Hamano
  Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

"Eric W. Biederman" <ebiederm@gmail.com> writes:

> From: "Eric W. Biederman" <ebiederm@xmission.com>
>
> Support short oids encoded in any algorithm, while ensuring enough of
> the oid is specified to disambiguate between all of the oids in the
> repository encoded in any algorithm.
>
> By default have the code continue to only accept oids specified in the
> storage hash algorithm of the repository, but when something is
> ambiguous display all of the possible oids from any accepted oid
> encoding.
>
> A new flag is added GET_OID_HASH_ANY that when supplied causes the
> code to accept oids specified in any hash algorithm, and to return the
> oids that were resolved.
>
> This implements the functionality that allows both SHA-1 and SHA-256
> object names, from the "Object names on the command line" section of
> the hash function transition document.
>
> Care is taken in get_short_oid so that when the result is ambiguous
> the output remains the same if GIT_OID_HASH_ANY was not supplied.  If
> GET_OID_HASH_ANY was supplied objects of any hash algorithm that match
> the prefix are displayed.
>
> This required updating repo_for_each_abbrev to give it a parameter so
> that it knows to look at all hash algorithms.
>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  builtin/rev-parse.c |  2 +-
>  hash-ll.h           |  1 +
>  object-name.c       | 46 ++++++++++++++++++++++++++++++++++-----------
>  object-name.h       |  3 ++-
>  4 files changed, 39 insertions(+), 13 deletions(-)
>
> diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
> index fde8861ca4e0..43e96765400c 100644
> --- a/builtin/rev-parse.c
> +++ b/builtin/rev-parse.c
> @@ -882,7 +882,7 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
>  				continue;
>  			}
>  			if (skip_prefix(arg, "--disambiguate=", &arg)) {
> -				repo_for_each_abbrev(the_repository, arg,
> +				repo_for_each_abbrev(the_repository, arg, the_hash_algo,
>  						     show_abbrev, NULL);
>  				continue;
>  			}
> diff --git a/hash-ll.h b/hash-ll.h
> index 10d84cc20888..2cfde63ae1cf 100644
> --- a/hash-ll.h
> +++ b/hash-ll.h
> @@ -145,6 +145,7 @@ struct object_id {
>  #define GET_OID_RECORD_PATH     0200
>  #define GET_OID_ONLY_TO_DIE    04000
>  #define GET_OID_REQUIRE_PATH  010000
> +#define GET_OID_HASH_ANY      020000

So far the "GET_OID_*" flags sound like the "GET_OID" is the prefix and
the remaining text describes the behavior. So in this sense, the
behavior here would be "HASH_ANY" and I think "ANY_HASH" is better. But
also, going by the description of this flag in the commit message, I
think "ACCEPT_ANY_HASH" is still better.
>
>  #define GET_OID_DISAMBIGUATORS \
>  	(GET_OID_COMMIT | GET_OID_COMMITTISH | \
> diff --git a/object-name.c b/object-name.c
> index 0bfa29dbbfe9..7dd6e5e47566 100644
> --- a/object-name.c
> +++ b/object-name.c
> @@ -25,6 +25,7 @@
>  #include "midx.h"
>  #include "commit-reach.h"
>  #include "date.h"
> +#include "object-file-convert.h"
>
>  static int get_oid_oneline(struct repository *r, const char *, struct object_id *, struct commit_list *);
>
> @@ -49,6 +50,7 @@ struct disambiguate_state {
>
>  static void update_candidates(struct disambiguate_state *ds, const struct object_id *current)
>  {
> +	/* The hash algorithm of current has already been filtered */

Is this new comment describing existing behavior before this patch or
after?

>  	if (ds->always_call_fn) {
>  		ds->ambiguous = ds->fn(ds->repo, current, ds->cb_data) ? 1 : 0;
>  		return;
> @@ -134,6 +136,8 @@ static void unique_in_midx(struct multi_pack_index *m,
>  {
>  	uint32_t num, i, first = 0;
>  	const struct object_id *current = NULL;
> +	int len = ds->len > ds->repo->hash_algo->hexsz ?
> +		ds->repo->hash_algo->hexsz : ds->len;

Nit: So this is just trying to use the shorter length between ds->len
and hexsz. Would it be good to encode this information into the variable
name, such as "len_short" or similar? And if so, flipping the ">" to
"<" would be more natural, like

    int shorter_len = ds->len < ds->repo->hash_algo->hexsz ?
                      ds->len : ds->repo->hash_algo->hexsz;

because it reads "if ds->len is shorter, use it; otherwise use hexsz".

>  	num = m->num_objects;
>
>  	if (!num)
> @@ -149,7 +153,7 @@ static void unique_in_midx(struct multi_pack_index *m,
>  	for (i = first; i < num && !ds->ambiguous; i++) {
>  		struct object_id oid;
>  		current = nth_midxed_object_oid(&oid, m, i);
> -		if (!match_hash(ds->len, ds->bin_pfx.hash, current->hash))
> +		if (!match_hash(len, ds->bin_pfx.hash, current->hash))
>  			break;
>  		update_candidates(ds, current);
>  	}
> @@ -159,6 +163,8 @@ static void unique_in_pack(struct packed_git *p,
>  			   struct disambiguate_state *ds)
>  {
>  	uint32_t num, i, first = 0;
> +	int len = ds->len > ds->repo->hash_algo->hexsz ?
> +		ds->repo->hash_algo->hexsz : ds->len;

Ditto.

>
>  	if (p->multi_pack_index)
>  		return;
> @@ -177,7 +183,7 @@ static void unique_in_pack(struct packed_git *p,
>  	for (i = first; i < num && !ds->ambiguous; i++) {
>  		struct object_id oid;
>  		nth_packed_object_id(&oid, p, i);
> -		if (!match_hash(ds->len, ds->bin_pfx.hash, oid.hash))
> +		if (!match_hash(len, ds->bin_pfx.hash, oid.hash))
>  			break;
>  		update_candidates(ds, &oid);
>  	}
> @@ -188,6 +194,10 @@ static void find_short_packed_object(struct disambiguate_state *ds)
>  	struct multi_pack_index *m;
>  	struct packed_git *p;
>
> +	/* Skip, unless oids from the storage hash algorithm are wanted */

Perhaps a simpler phrasing would be

    /* Only accept oids specified in the storage hash algorithm of the repository. */

which is closer to the wording of the commit message?

> +	if (ds->bin_pfx.algo && (&hash_algos[ds->bin_pfx.algo] != ds->repo->hash_algo))
> +		return;
> +
>  	for (m = get_multi_pack_index(ds->repo); m && !ds->ambiguous;
>  	     m = m->next)
>  		unique_in_midx(m, ds);
> @@ -326,11 +336,12 @@ int set_disambiguate_hint_config(const char *var, const char *value)
>
>  static int init_object_disambiguation(struct repository *r,
>  				      const char *name, int len,
> +				      const struct git_hash_algo *algo,
>  				      struct disambiguate_state *ds)
>  {
>  	int i;
>
> -	if (len < MINIMUM_ABBREV || len > the_hash_algo->hexsz)
> +	if (len < MINIMUM_ABBREV || len > GIT_MAX_HEXSZ)
>  		return -1;
>
>  	memset(ds, 0, sizeof(*ds));
> @@ -357,6 +368,7 @@ static int init_object_disambiguation(struct repository *r,
>  	ds->len = len;
>  	ds->hex_pfx[len] = '\0';
>  	ds->repo = r;
> +	ds->bin_pfx.algo = algo ? hash_algo_by_ptr(algo) : GIT_HASH_UNKNOWN;
>  	prepare_alt_odb(r);
>  	return 0;
>  }
> @@ -491,9 +503,10 @@ static int repo_collect_ambiguous(struct repository *r UNUSED,
>  	return collect_ambiguous(oid, data);
>  }
>
> -static int sort_ambiguous(const void *a, const void *b, void *ctx)
> +static int sort_ambiguous(const void *va, const void *vb, void *ctx)
>  {
>  	struct repository *sort_ambiguous_repo = ctx;
> +	const struct object_id *a = va, *b = vb;
>  	int a_type = oid_object_info(sort_ambiguous_repo, a, NULL);
>  	int b_type = oid_object_info(sort_ambiguous_repo, b, NULL);
>  	int a_type_sort;
> @@ -503,8 +516,12 @@ static int sort_ambiguous(const void *a, const void *b, void *ctx)
>  	 * Sorts by hash within the same object type, just as
>  	 * oid_array_for_each_unique() would do.
>  	 */
> -	if (a_type == b_type)
> -		return oidcmp(a, b);
> +	if (a_type == b_type) {
> +		if (a->algo == b->algo)
> +			return oidcmp(a, b);
> +		else
> +			return a->algo > b->algo ? 1 : -1;
> +	}

This is duplicated from the previous patch (void_hashcmp). Do we want to
avoid a performance penalty from calling out to a common function?

>  	/*
>  	 * Between object types show tags, then commits, and finally
> @@ -533,8 +550,12 @@ static enum get_oid_result get_short_oid(struct repository *r,
>  	int status;
>  	struct disambiguate_state ds;
>  	int quietly = !!(flags & GET_OID_QUIETLY);
> +	const struct git_hash_algo *algo = r->hash_algo;

I see some existing uses of the name "algop" (presumably to mean
pointer-to-algorithm) and wonder if we should do the same here, for
consistency.

> +
> +	if (flags & GET_OID_HASH_ANY)
> +		algo = NULL;

If we look at the change to init_object_disambiguation() above, it looks
like the "algo" here is only used to find a GIT_HASH_* constant. So it
seems a bit roundabout to use a NULL here, then inside
init_object_disambiguation() do

    ds->bin_pfx.algo = algo ? hash_algo_by_ptr(algo) : GIT_HASH_UNKNOWN;

to convert the NULL to GIT_HASH_UNKNOWN when we could probably just pass
in the GIT_HASH_* constant directly from the caller.

> -	if (init_object_disambiguation(r, name, len, &ds) < 0)
> +	if (init_object_disambiguation(r, name, len, algo, &ds) < 0)
>  		return -1;
>
>  	if (HAS_MULTI_BITS(flags & GET_OID_DISAMBIGUATORS))
> @@ -588,7 +609,7 @@ static enum get_oid_result get_short_oid(struct repository *r,
>  		if (!ds.ambiguous)
>  			ds.fn = NULL;
>
> -		repo_for_each_abbrev(r, ds.hex_pfx, collect_ambiguous, &collect);
> +		repo_for_each_abbrev(r, ds.hex_pfx, algo, collect_ambiguous, &collect);
>  		sort_ambiguous_oid_array(r, &collect);
>
>  		if (oid_array_for_each(&collect, show_ambiguous_object, &out))
> @@ -610,13 +631,14 @@ static enum get_oid_result get_short_oid(struct repository *r,
>  }
>
>  int repo_for_each_abbrev(struct repository *r, const char *prefix,
> +			 const struct git_hash_algo *algo,
>  			 each_abbrev_fn fn, void *cb_data)
>  {
>  	struct oid_array collect = OID_ARRAY_INIT;
>  	struct disambiguate_state ds;
>  	int ret;
>
> -	if (init_object_disambiguation(r, prefix, strlen(prefix), &ds) < 0)
> +	if (init_object_disambiguation(r, prefix, strlen(prefix), algo, &ds) < 0)
>  		return -1;
>
>  	ds.always_call_fn = 1;
> @@ -787,10 +809,12 @@ void strbuf_add_unique_abbrev(struct strbuf *sb, const struct object_id *oid,
>  int repo_find_unique_abbrev_r(struct repository *r, char *hex,
>  			      const struct object_id *oid, int len)
>  {
> +	const struct git_hash_algo *algo =
> +		oid->algo ? &hash_algos[oid->algo] : r->hash_algo;

It would be nice to have symmetry with the style in get_short_oid() and
instead do

   const struct git_hash_algo *algo = r->hash_algo;

   if (oid->algo)
       algo = &hash_algos[oid->algo];

>  	struct disambiguate_state ds;
>  	struct min_abbrev_data mad;
>  	struct object_id oid_ret;
> -	const unsigned hexsz = r->hash_algo->hexsz;
> +	const unsigned hexsz = algo->hexsz;
>
>  	if (len < 0) {
>  		unsigned long count = repo_approximate_object_count(r);
> @@ -826,7 +850,7 @@ int repo_find_unique_abbrev_r(struct repository *r, char *hex,
>
>  	find_abbrev_len_packed(&mad);
>
> -	if (init_object_disambiguation(r, hex, mad.cur_len, &ds) < 0)
> +	if (init_object_disambiguation(r, hex, mad.cur_len, algo, &ds) < 0)
>  		return -1;
>
>  	ds.fn = repo_extend_abbrev_len;
> diff --git a/object-name.h b/object-name.h
> index 9ae522307148..064ddc97d1fe 100644
> --- a/object-name.h
> +++ b/object-name.h
> @@ -67,7 +67,8 @@ enum get_oid_result get_oid_with_context(struct repository *repo, const char *st
>
>
>  typedef int each_abbrev_fn(const struct object_id *oid, void *);
> -int repo_for_each_abbrev(struct repository *r, const char *prefix, each_abbrev_fn, void *);
> +int repo_for_each_abbrev(struct repository *r, const char *prefix,
> +			 const struct git_hash_algo *algo, each_abbrev_fn, void *);
>
>  int set_disambiguate_hint_config(const char *var, const char *value);
>
> --
> 2.41.0

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 04/30] repository: add a compatibility hash algorithm
  2023-10-02  2:40   ` [PATCH v2 04/30] repository: add a compatibility hash algorithm Eric W. Biederman
@ 2024-02-13 10:02     ` Linus Arver
  2024-02-15 11:22     ` Patrick Steinhardt
  1 sibling, 0 replies; 104+ messages in thread
From: Linus Arver @ 2024-02-13 10:02 UTC (permalink / raw)
  To: Eric W. Biederman, Junio C Hamano
  Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

"Eric W. Biederman" <ebiederm@gmail.com> writes:

> From: "Eric W. Biederman" <ebiederm@xmission.com>
>
> We currently have support for using a full stage 4 SHA-256
> implementation.  However, we'd like to support interoperability with
> SHA-1 repositories as well.  The transition plan anticipates a
> compatibility hash algorithm configuration option that we can use to
> implement support for this.

Perhaps add

    See section "Object names on the command line" in
    git/Documentation/technical/hash-function-transition.txt .

? That section does not use the language "compatibility hash algorithm"
though, and I think "hash compatibility option" is easier to say.

Hmm, or are you talking about "compatObjectFormat" discussed in that doc?

> Let's add an element to the repository
> structure that indicates the compatibility hash algorithm so we can use
> it when we need to consider interoperability between algorithms.

How about just

    Add a hash compatibility option to the repository structure to
    consider interoperability between hash algorithms.

?

Aside: already we are seeing multiple keywords "compatibility",
"transition", "interoperability" to all mean roughly similar things. I
hope we can settle on just one (ideally) in the codebase by the end of
this series.

> Add a helper function repo_set_compat_hash_algo that takes a
> compatibility hash algorithm and sets "repo->compat_hash_algo".  If
> GIT_HASH_UNKNOWN is passed as the compatibility hash algorithm
> "repo->compat_hash_algo" is set to NULL.
>
> For now, the code results in "repo->compat_hash_algo" always being set
> to NULL, but that will change once a configuration option is added.

It's not clear to me whether you are talking about a config option to
describe the different stages of transition around algorithms, or a hash
algorithm itself (SHA1, SHA256, UNKNOWN).

> Inspired-by: brian m. carlson <sandals@crustytoothpaste.net>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  repository.c | 8 ++++++++
>  repository.h | 4 ++++
>  setup.c      | 3 +++
>  3 files changed, 15 insertions(+)
>
> diff --git a/repository.c b/repository.c
> index a7679ceeaa45..80252b79e93e 100644
> --- a/repository.c
> +++ b/repository.c
> @@ -104,6 +104,13 @@ void repo_set_hash_algo(struct repository *repo, int hash_algo)
>  	repo->hash_algo = &hash_algos[hash_algo];
>  }
>  
> +void repo_set_compat_hash_algo(struct repository *repo, int algo)
> +{
> +	if (hash_algo_by_ptr(repo->hash_algo) == algo)
> +		BUG("hash_algo and compat_hash_algo match");
> +	repo->compat_hash_algo = algo ? &hash_algos[algo] : NULL;
> +}

Ah, OK. So we are talking about an algorithm itself. Looking at this
code it seems like a compat_hash_algo is something like "the hash
algorithm I want my repository to start using but which has not
already". Such a description would have been useful in the commit
message.

Nit: I think 

    BUG("compat_hash_algo may not be the same as hash_algo");

is more natural because the error message should explain the badness of
the behavior rather than merely reflect the triggering condition. And
the "star of the show" here is the new compat_hash_algo member, so it
makes sense to emphasize that more as the only subject of the sentence
instead of grouping it together with hash_algo (given them equal
importance).

> +
>  /*
>   * Attempt to resolve and set the provided 'gitdir' for repository 'repo'.
>   * Return 0 upon success and a non-zero value upon failure.
> @@ -184,6 +191,7 @@ int repo_init(struct repository *repo,
>  		goto error;
>  
>  	repo_set_hash_algo(repo, format.hash_algo);
> +	repo_set_compat_hash_algo(repo, GIT_HASH_UNKNOWN);
>  	repo->repository_format_worktree_config = format.worktree_config;
>  
>  	/* take ownership of format.partial_clone */
> diff --git a/repository.h b/repository.h
> index 5f18486f6465..bf3fc601cc53 100644
> --- a/repository.h
> +++ b/repository.h
> @@ -160,6 +160,9 @@ struct repository {
>  	/* Repository's current hash algorithm, as serialized on disk. */
>  	const struct git_hash_algo *hash_algo;
>  
> +	/* Repository's compatibility hash algorithm. */

Perhaps add "May not be the same as hash_algo." ?

> +	const struct git_hash_algo *compat_hash_algo;
> +
>  	/* A unique-id for tracing purposes. */
>  	int trace2_repo_id;
>  
> @@ -199,6 +202,7 @@ void repo_set_gitdir(struct repository *repo, const char *root,
>  		     const struct set_gitdir_args *extra_args);
>  void repo_set_worktree(struct repository *repo, const char *path);
>  void repo_set_hash_algo(struct repository *repo, int algo);
> +void repo_set_compat_hash_algo(struct repository *repo, int compat_algo);
>  void initialize_the_repository(void);
>  RESULT_MUST_BE_USED
>  int repo_init(struct repository *r, const char *gitdir, const char *worktree);
> diff --git a/setup.c b/setup.c
> index 18927a847b86..aa8bf5da5226 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -1564,6 +1564,8 @@ const char *setup_git_directory_gently(int *nongit_ok)
>  		}
>  		if (startup_info->have_repository) {
>  			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
> +			repo_set_compat_hash_algo(the_repository,
> +						  GIT_HASH_UNKNOWN);
>  			the_repository->repository_format_worktree_config =
>  				repo_fmt.worktree_config;
>  			/* take ownership of repo_fmt.partial_clone */
> @@ -1657,6 +1659,7 @@ void check_repository_format(struct repository_format *fmt)
>  	check_repository_format_gently(get_git_dir(), fmt, NULL);
>  	startup_info->have_repository = 1;
>  	repo_set_hash_algo(the_repository, fmt->hash_algo);
> +	repo_set_compat_hash_algo(the_repository, GIT_HASH_UNKNOWN);
>  	the_repository->repository_format_worktree_config =
>  		fmt->worktree_config;
>  	the_repository->repository_format_partial_clone =
> -- 
> 2.41.0

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 05/30] loose: add a mapping between SHA-1 and SHA-256 for loose objects
  2023-10-02  2:40   ` [PATCH v2 05/30] loose: add a mapping between SHA-1 and SHA-256 for loose objects Eric W. Biederman
@ 2024-02-14  7:20     ` Linus Arver
  2024-02-15  5:33       ` Eric W. Biederman
  2024-02-15 11:22     ` Patrick Steinhardt
  1 sibling, 1 reply; 104+ messages in thread
From: Linus Arver @ 2024-02-14  7:20 UTC (permalink / raw)
  To: Eric W. Biederman, Junio C Hamano
  Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman

"Eric W. Biederman" <ebiederm@gmail.com> writes:

> From: "brian m. carlson" <sandals@crustytoothpaste.net>
>
> As part of the transition plan, we'd like to add a file in the .git
> directory that maps loose objects between SHA-1 and SHA-256.  Let's
> implement the specification in the transition plan and store this data
> on a per-repository basis in struct repository.

Could you explain a bit what the specification is, exactly? That would
save reviewers the trouble of comparing the large chunk of new code here
with the transition plan (which I assume is still
Documentation/technical/hash-function-transition.txt.

Also, are there any slight deviations from the specification for reasons
that may not be obvious to reviewers?

I would prefer if this patch is split up into smaller preparatory
patches, starting with the core essentials and then building it up
piece-by-piece to make it easier to review.

> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  Makefile              |   1 +
>  loose.c               | 246 ++++++++++++++++++++++++++++++++++++++++++
>  loose.h               |  22 ++++
>  object-file-convert.c |  14 ++-
>  object-store-ll.h     |   3 +
>  object.c              |   2 +
>  repository.c          |   6 ++
>  7 files changed, 293 insertions(+), 1 deletion(-)
>  create mode 100644 loose.c
>  create mode 100644 loose.h
>
> diff --git a/Makefile b/Makefile
> index f7e824f25cda..3c18664def9a 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1053,6 +1053,7 @@ LIB_OBJS += list-objects-filter.o
>  LIB_OBJS += list-objects.o
>  LIB_OBJS += lockfile.o
>  LIB_OBJS += log-tree.o
> +LIB_OBJS += loose.o

The name "loose" appears to be a bit too generic for something with such
a specialized role (_mapping_ of loose objects). Would
"loose-object-map" be a better name?

>  LIB_OBJS += ls-refs.o
>  LIB_OBJS += mailinfo.o
>  LIB_OBJS += mailmap.o
> diff --git a/loose.c b/loose.c
> new file mode 100644
> index 000000000000..6ba73cc84dca
> --- /dev/null
> +++ b/loose.c
> @@ -0,0 +1,246 @@
> +#include "git-compat-util.h"
> +#include "hash.h"
> +#include "path.h"
> +#include "object-store.h"
> +#include "hex.h"
> +#include "wrapper.h"
> +#include "gettext.h"
> +#include "loose.h"
> +#include "lockfile.h"
> +
> +static const char *loose_object_header = "# loose-object-idx\n";

IDK what the "loose-object-idx" is vs the "loose-object-map", but I
guess I need to read more of the code.

But also, I am at my limits here and am unable to review this patch as
is (too big for me to chew at once, sorry).

I'll pause my review of this series here to give Eric B some time to
respond. Thanks.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 00/30] initial support for multiple hash functions
  2024-02-08  0:24     ` Linus Arver
  2024-02-08  6:11       ` Patrick Steinhardt
@ 2024-02-14  7:36       ` Linus Arver
  1 sibling, 0 replies; 104+ messages in thread
From: Linus Arver @ 2024-02-14  7:36 UTC (permalink / raw)
  To: Junio C Hamano, git
  Cc: Eric W. Biederman, brian m. carlson, Eric Sunshine,
	Eric W. Biederman

Linus Arver <linusa@google.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> "Eric W. Biederman" <ebiederm@gmail.com> writes:
>>
>>> This addresses all of the known test failures from v1 of this set of
>>> changes.  In particular I have reworked commit_tree_extended which
>>> was flagged by smatch, -Werror=array-bounds, and the leak detector.
>>>
>>> One functional bug was fixed in repo_for_each_abbrev where it was
>>> mistakenly displaying too many ambiguous oids.
>>>
>>> I am posting this so that people review and testing of this patchset
>>> won't be distracted by the known and fixed issues.
>>
>> We haven't seen any reviews on this second round, and have had it
>> outside 'next' for too long.  I am tempted to say that we merge it
>> to 'next' and see if anybody screams at this point.
>
> FWIW out of all the "Needs review" topics this one seemed like the most
> deserving of another pair of eyes, and I was planning to review some of
> the patches here this week + the weekend. If my review takes too long
> (taking longer than this weekend) I can give another update next week
> saying "too hard for me, please don't wait for me" to unblock you from
> merging to next.
>
> Thanks.

Unfortunately I don't think I can finish reviewing the rest of the
series (after all this time I've only been able to review just 4 out of
30 patches). I'm also stuck on trying to understand patch 5, as there is
a lot going on there.

FWIW a lot (perhaps all?) of my comments so far were around readability
and not material to the actual design or approach of anything AFAICS.
So, it's time for me to say "don't bother waiting for me" as I said
(predicted?) earlier.

Don't bother waiting for me. Thanks.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 05/30] loose: add a mapping between SHA-1 and SHA-256 for loose objects
  2024-02-14  7:20     ` Linus Arver
@ 2024-02-15  5:33       ` Eric W. Biederman
  0 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2024-02-15  5:33 UTC (permalink / raw)
  To: Linus Arver
  Cc: Junio C Hamano, git, brian m. carlson, Eric Sunshine,
	Eric W. Biederman

Linus Arver <linusa@google.com> writes:

> I'll pause my review of this series here to give Eric B some time to
> respond. Thanks.

I will respond shortly.  The re-awakening of the review process came
just as I am in the middle of something else that is taking a lot of
cycles.

Eric


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 02/30] oid-array: teach oid-array to handle multiple kinds of oids
  2024-02-13  8:16     ` Linus Arver
@ 2024-02-15  6:22       ` Eric W. Biederman
  2024-02-16  0:16         ` Linus Arver
  0 siblings, 1 reply; 104+ messages in thread
From: Eric W. Biederman @ 2024-02-15  6:22 UTC (permalink / raw)
  To: Linus Arver
  Cc: Junio C Hamano, git, brian m. carlson, Eric Sunshine,
	Eric W. Biederman

Linus Arver <linusa@google.com> writes:

> "Eric W. Biederman" <ebiederm@gmail.com> writes:
>
>> From: "Eric W. Biederman" <ebiederm@xmission.com>
>>
>> While looking at how to handle input of both SHA-1 and SHA-256 oids in
>> get_oid_with_context, I realized that the oid_array in
>> repo_for_each_abbrev might have more than one kind of oid stored in it
>> simultaneously.
>>
>> Update to oid_array_append to ensure that oids added to an oid array
>
> s/Update to/Update
>
>> always have an algorithm set.
>>
>> Update void_hashcmp to first verify two oids use the same hash algorithm
>> before comparing them to each other.
>>
>> With that oid-array should be safe to use with different kinds of
>
> s/oid-array/oid_array
>
>> oids simultaneously.
>>
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>>  oid-array.c | 12 ++++++++++--
>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/oid-array.c b/oid-array.c
>> index 8e4717746c31..1f36651754ed 100644
>> --- a/oid-array.c
>> +++ b/oid-array.c
>> @@ -6,12 +6,20 @@ void oid_array_append(struct oid_array *array, const struct object_id *oid)
>>  {
>>  	ALLOC_GROW(array->oid, array->nr + 1, array->alloc);
>>  	oidcpy(&array->oid[array->nr++], oid);
>> +	if (!oid->algo)
>> +		oid_set_algo(&array->oid[array->nr - 1], the_hash_algo);
>
> How come we can't set oid->algo _before_ we call oidcpy()? It seems odd
> that we do the copy first and then modify what we just copied after the
> fact, instead of making sure that the thing we want to copy is correct
> before doing the copy.
>
> But also, if we are going to make the oid object "correct" before
> invoking oidcpy(), we might as well do it when the oid is first
> created/used (in the caller(s) of this function). I don't demand that
> you find/demonstrate where all these places are in this series (maybe
> that's a hairy problem to tackle?), but it seems cleaner in principle to
> fix the creation of oid objects instead of having to make oid users
> clean up their act like this after using them.

There is a hairy problem here.

I believe for reasons of simplicity when the algo field was added to
struct object_id it was allowed to be zero for users that don't
particularly care about the hash algorithm, and are happy to use the git
default hash algorithm.

Me experience working on this set of change set showed that there
are oids without their algo set in all kinds of places in the tree.

I could not think of any sure way to go through the entire tree
and find those users, so I just made certain that oid array handled
that case.

I need algo to be set properly in the oids in the oid array so I
could extend oid_array to hold multiple kinds of oids at the same
time.  To allow multiple kinds of oids at the same time void_hashcmp
needs a simple and reliable way to tell what the algorithm is of
any given oid.

>
>>  	array->sorted = 0;
>>  }
>>  
>> -static int void_hashcmp(const void *a, const void *b)
>> +static int void_hashcmp(const void *va, const void *vb)
>>  {
>> -	return oidcmp(a, b);
>> +	const struct object_id *a = va, *b = vb;
>> +	int ret;
>> +	if (a->algo == b->algo)
>> +		ret = oidcmp(a, b);
>
> This makes sense (per the commit message description) ...
>
>> +	else
>> +		ret = a->algo > b->algo ? 1 : -1;
>
> ... but this seems to go against it? I thought you wanted to only ever
> compare hashes if they were of the same algo? It would be good to add a
> comment explaining why this is OK (we are no longer doing a byte-by-byte
> comparison of these oids any more here like we do for oidcmp() above
> which boils down to calling memcmp()).

So the goal of this change is for oid_array to be able to hold hashes
from multiple algorithms at the same time.

A key part of oid_array is oid_array_sort that allows functions such
as oid_array_lookup and oid_array_for_each_unique.

To that end there needs to be a total ordering of oids.

The function oidcmp is only defined when two oids are of the same
algorithm, it does not even test to detect the case of comparing
mismatched algorithms.

Therefore to get a total ordering of oids.  I must use oidcmp
when the algorithm is the same (the common case) or simply order
the oids by algorithm when the algorithms are different.

All of this is relevant to get_oid_with_context as get_oid_with_context
and it's helper functions contain the logic that determines what
we do when a hex string that is ambiguous is specified.

In the ambiguous case all of the possible candidates are placed in
an oid_array, sorted and then displayed.

With a repository that can knows both the sha1 and the sha256 oid
of it's objects it is possible for a short oid to match both
some sha1 oids and some sha256 oids.

>> +	return ret;
>
> Also, in terms of style I think the "early return for errors" style
> would be simpler to read. I.e.
>
>     if (a->algo > b->algo)
>         return 1;
>
>     if (a->algo < b->algo)
>         return -1;
>
>     return oidcmd(a, b);
>

I can see doing:
	if (a->algo == b->algo)
        	return oidcmp(a,b);

	if (a->algo > b->algo)
        	return 1;
        else
        	return -1;

Or even:
	if (a->algo == b->algo)
        	return oidcmp(a,b);

	return a->algo - b->algo;

Although I suspect using subtraction is a bit too clever.

Comparing for less than, and greater than, and then assuming
the values are equal hides what is important before calling
oidcmp which is that the algo values are equal.

>>  }
>>  
>>  void oid_array_sort(struct oid_array *array)
>> -- 
>> 2.41.0

Eric

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 02/30] oid-array: teach oid-array to handle multiple kinds of oids
  2024-02-13  8:31     ` Kristoffer Haugsbakk
@ 2024-02-15  6:24       ` Eric W. Biederman
  0 siblings, 0 replies; 104+ messages in thread
From: Eric W. Biederman @ 2024-02-15  6:24 UTC (permalink / raw)
  To: Kristoffer Haugsbakk
  Cc: git, brian m. carlson, Eric Sunshine, Eric W. Biederman,
	Junio C Hamano

"Kristoffer Haugsbakk" <code@khaugsbakk.name> writes:

> On Mon, Oct 2, 2023, at 04:40, Eric W. Biederman wrote:
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>
> Most of your patches have this sign-off line with your name quoted.

At least for email syntax from which Signed-off-by syntax descends
having a period after my middle initial requires the name to be in
quotes.

Eric




^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 01/30] object-file-convert: stubs for converting from one object format to another
  2023-10-02  2:40   ` [PATCH v2 01/30] object-file-convert: stubs for converting from one object format to another Eric W. Biederman
  2024-02-08  8:23     ` Linus Arver
@ 2024-02-15 11:21     ` Patrick Steinhardt
  1 sibling, 0 replies; 104+ messages in thread
From: Patrick Steinhardt @ 2024-02-15 11:21 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Junio C Hamano, git, brian m. carlson, Eric Sunshine,
	Eric W. Biederman

[-- Attachment #1: Type: text/plain, Size: 5851 bytes --]

On Sun, Oct 01, 2023 at 09:40:05PM -0500, Eric W. Biederman wrote:
> From: "Eric W. Biederman" <ebiederm@xmission.com>
> 
> Two basic functions are provided:
> - convert_object_file Takes an object file it's type and hash algorithm
>   and converts it into the equivalent object file that would
>   have been generated with hash algorithm "to".
> 
>   For blob objects there is no conversation to be done and it is an
>   error to use this function on them.
> 
>   For commit, tree, and tag objects embedded oids are replaced by the
>   oids of the objects they refer to with those objects and their
>   object ids reencoded in with the hash algorithm "to".  Signatures
>   are rearranged so that they remain valid after the object has
>   been reencoded.
> 
> - repo_oid_to_algop which takes an oid that refers to an object file
>   and returns the oid of the equivalent object file generated
>   with the target hash algorithm.
> 
> The pair of files object-file-convert.c and object-file-convert.h are
> introduced to hold as much of this logic as possible to keep this
> conversion logic cleanly separated from everything else and in the
> hopes that someday the code will be clean enough git can support
> compiling out support for sha1 and the various conversion functions.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  Makefile              |  1 +
>  object-file-convert.c | 57 +++++++++++++++++++++++++++++++++++++++++++
>  object-file-convert.h | 24 ++++++++++++++++++
>  3 files changed, 82 insertions(+)
>  create mode 100644 object-file-convert.c
>  create mode 100644 object-file-convert.h
> 
> diff --git a/Makefile b/Makefile
> index 577630936535..f7e824f25cda 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1073,6 +1073,7 @@ LIB_OBJS += notes-cache.o
>  LIB_OBJS += notes-merge.o
>  LIB_OBJS += notes-utils.o
>  LIB_OBJS += notes.o
> +LIB_OBJS += object-file-convert.o
>  LIB_OBJS += object-file.o
>  LIB_OBJS += object-name.o
>  LIB_OBJS += object.o
> diff --git a/object-file-convert.c b/object-file-convert.c
> new file mode 100644
> index 000000000000..4777aba83636
> --- /dev/null
> +++ b/object-file-convert.c
> @@ -0,0 +1,57 @@
> +#include "git-compat-util.h"
> +#include "gettext.h"
> +#include "strbuf.h"
> +#include "repository.h"
> +#include "hash-ll.h"
> +#include "object.h"
> +#include "object-file-convert.h"
> +
> +int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
> +		      const struct git_hash_algo *to, struct object_id *dest)
> +{
> +	/*
> +	 * If the source algorithm is not set, then we're using the
> +	 * default hash algorithm for that object.
> +	 */
> +	const struct git_hash_algo *from =
> +		src->algo ? &hash_algos[src->algo] : repo->hash_algo;
> +
> +	if (from == to) {
> +		if (src != dest)
> +			oidcpy(dest, src);
> +		return 0;
> +	}
> +	return -1;
> +}

In it's current form, `repo_oid_to_algop()` basically never does
anything except for copying over the object ID because we do not handle
the case where object hashes are different. I assume this is intended,
as we basically only provide stubs in this commit. But still, it would
help to document this in-code as well with a comment.

> +int convert_object_file(struct strbuf *outbuf,
> +			const struct git_hash_algo *from,
> +			const struct git_hash_algo *to,
> +			const void *buf, size_t len,
> +			enum object_type type,
> +			int gentle)
> +{
> +	int ret;
> +
> +	/* Don't call this function when no conversion is necessary */
> +	if ((from == to) || (type == OBJ_BLOB))
> +		BUG("Refusing noop object file conversion");

The extra braces around comparisons are unneeded and to the best of my
knowledge not customary in our code base. Also, error messages should
start with a lower-case letter.

> +	switch (type) {
> +	case OBJ_COMMIT:
> +	case OBJ_TREE:
> +	case OBJ_TAG:
> +	default:
> +		/* Not implemented yet, so fail. */
> +		ret = -1;
> +		break;
> +	}

It's a bit weird that we handle all object types except for blobs
separately, and then still have a `default` statement. I would've
thought that we should handle the object types specifically and set `ret
= -1` for all of them, and then the `default` case would instead call
`BUG()` due to an unknown object type.

> +	if (!ret)
> +		return 0;
> +	if (gentle) {
> +		strbuf_release(outbuf);
> +		return ret;
> +	}

Do you really intend to call `strbuf_release()` on the caller provided
buffer, or should this rather be `strbuf_reset()`? Memory management of
such an in/out parameter should typically be handled by the caller, not
the callee.

> +	die(_("Failed to convert object from %s to %s"),
> +		from->name, to->name);
> +}

The error message should start with a lower-case letter.

> diff --git a/object-file-convert.h b/object-file-convert.h
> new file mode 100644
> index 000000000000..a4f802aa8eea
> --- /dev/null
> +++ b/object-file-convert.h
> @@ -0,0 +1,24 @@
> +#ifndef OBJECT_CONVERT_H
> +#define OBJECT_CONVERT_H
> +
> +struct repository;
> +struct object_id;
> +struct git_hash_algo;
> +struct strbuf;
> +#include "object.h"
> +
> +int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
> +		      const struct git_hash_algo *to, struct object_id *dest);
> +
> +/*
> + * Convert an object file from one hash algorithm to another algorithm.
> + * Return -1 on failure, 0 on success.
> + */
> +int convert_object_file(struct strbuf *outbuf,
> +			const struct git_hash_algo *from,
> +			const struct git_hash_algo *to,
> +			const void *buf, size_t len,
> +			enum object_type type,
> +			int gentle);

It would be nice to document what `gentle` does.

Patrick

> +#endif /* OBJECT_CONVERT_H */
> -- 
> 2.41.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 02/30] oid-array: teach oid-array to handle multiple kinds of oids
  2023-10-02  2:40   ` [PATCH v2 02/30] oid-array: teach oid-array to handle multiple kinds of oids Eric W. Biederman
  2024-02-13  8:16     ` Linus Arver
  2024-02-13  8:31     ` Kristoffer Haugsbakk
@ 2024-02-15 11:21     ` Patrick Steinhardt
  2 siblings, 0 replies; 104+ messages in thread
From: Patrick Steinhardt @ 2024-02-15 11:21 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Junio C Hamano, git, brian m. carlson, Eric Sunshine,
	Eric W. Biederman

[-- Attachment #1: Type: text/plain, Size: 2282 bytes --]

On Sun, Oct 01, 2023 at 09:40:06PM -0500, Eric W. Biederman wrote:
> From: "Eric W. Biederman" <ebiederm@xmission.com>
> 
> While looking at how to handle input of both SHA-1 and SHA-256 oids in
> get_oid_with_context, I realized that the oid_array in
> repo_for_each_abbrev might have more than one kind of oid stored in it
> simultaneously.
> 
> Update to oid_array_append to ensure that oids added to an oid array
> always have an algorithm set.
> 
> Update void_hashcmp to first verify two oids use the same hash algorithm
> before comparing them to each other.
> 
> With that oid-array should be safe to use with different kinds of
> oids simultaneously.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  oid-array.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/oid-array.c b/oid-array.c
> index 8e4717746c31..1f36651754ed 100644
> --- a/oid-array.c
> +++ b/oid-array.c
> @@ -6,12 +6,20 @@ void oid_array_append(struct oid_array *array, const struct object_id *oid)
>  {
>  	ALLOC_GROW(array->oid, array->nr + 1, array->alloc);
>  	oidcpy(&array->oid[array->nr++], oid);
> +	if (!oid->algo)
> +		oid_set_algo(&array->oid[array->nr - 1], the_hash_algo);

I feel like it's a design wart that `oid_array_append()` now started to
depend on repository discovery, adding an external dependency to it that
may cause very confusing behaviour. Are there for example ever cases
where we populate such an OID array before we have discovered the repo?
Can it happen that we use OID arrays in the context of a submodule that
has a different object ID than the main repository?

>  	array->sorted = 0;
>  }
>  
> -static int void_hashcmp(const void *a, const void *b)
> +static int void_hashcmp(const void *va, const void *vb)
>  {
> -	return oidcmp(a, b);
> +	const struct object_id *a = va, *b = vb;
> +	int ret;
> +	if (a->algo == b->algo)
> +		ret = oidcmp(a, b);
> +	else
> +		ret = a->algo > b->algo ? 1 : -1;

Okay, so we basically end up sorting first by the algorithm, and then we
sort all object IDs of a specific algorithm relative to each other.

Patrick

> +	return ret;
>  }
>  
>  void oid_array_sort(struct oid_array *array)
> -- 
> 2.41.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 03/30] object-names: support input of oids in any supported hash
  2023-10-02  2:40   ` [PATCH v2 03/30] object-names: support input of oids in any supported hash Eric W. Biederman
  2024-02-13  9:33     ` Linus Arver
@ 2024-02-15 11:21     ` Patrick Steinhardt
  1 sibling, 0 replies; 104+ messages in thread
From: Patrick Steinhardt @ 2024-02-15 11:21 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Junio C Hamano, git, brian m. carlson, Eric Sunshine,
	Eric W. Biederman

[-- Attachment #1: Type: text/plain, Size: 10216 bytes --]

On Sun, Oct 01, 2023 at 09:40:07PM -0500, Eric W. Biederman wrote:
> From: "Eric W. Biederman" <ebiederm@xmission.com>
> 
> Support short oids encoded in any algorithm, while ensuring enough of
> the oid is specified to disambiguate between all of the oids in the
> repository encoded in any algorithm.
> 
> By default have the code continue to only accept oids specified in the
> storage hash algorithm of the repository, but when something is
> ambiguous display all of the possible oids from any accepted oid
> encoding.
> 
> A new flag is added GET_OID_HASH_ANY that when supplied causes the
> code to accept oids specified in any hash algorithm, and to return the
> oids that were resolved.
> 
> This implements the functionality that allows both SHA-1 and SHA-256
> object names, from the "Object names on the command line" section of
> the hash function transition document.
> 
> Care is taken in get_short_oid so that when the result is ambiguous
> the output remains the same if GIT_OID_HASH_ANY was not supplied.  If
> GET_OID_HASH_ANY was supplied objects of any hash algorithm that match
> the prefix are displayed.
> 
> This required updating repo_for_each_abbrev to give it a parameter so
> that it knows to look at all hash algorithms.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  builtin/rev-parse.c |  2 +-
>  hash-ll.h           |  1 +
>  object-name.c       | 46 ++++++++++++++++++++++++++++++++++-----------
>  object-name.h       |  3 ++-
>  4 files changed, 39 insertions(+), 13 deletions(-)
> 
> diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
> index fde8861ca4e0..43e96765400c 100644
> --- a/builtin/rev-parse.c
> +++ b/builtin/rev-parse.c
> @@ -882,7 +882,7 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
>  				continue;
>  			}
>  			if (skip_prefix(arg, "--disambiguate=", &arg)) {
> -				repo_for_each_abbrev(the_repository, arg,
> +				repo_for_each_abbrev(the_repository, arg, the_hash_algo,
>  						     show_abbrev, NULL);
>  				continue;
>  			}
> diff --git a/hash-ll.h b/hash-ll.h
> index 10d84cc20888..2cfde63ae1cf 100644
> --- a/hash-ll.h
> +++ b/hash-ll.h
> @@ -145,6 +145,7 @@ struct object_id {
>  #define GET_OID_RECORD_PATH     0200
>  #define GET_OID_ONLY_TO_DIE    04000
>  #define GET_OID_REQUIRE_PATH  010000
> +#define GET_OID_HASH_ANY      020000
>  
>  #define GET_OID_DISAMBIGUATORS \
>  	(GET_OID_COMMIT | GET_OID_COMMITTISH | \
> diff --git a/object-name.c b/object-name.c
> index 0bfa29dbbfe9..7dd6e5e47566 100644
> --- a/object-name.c
> +++ b/object-name.c
> @@ -25,6 +25,7 @@
>  #include "midx.h"
>  #include "commit-reach.h"
>  #include "date.h"
> +#include "object-file-convert.h"
>  
>  static int get_oid_oneline(struct repository *r, const char *, struct object_id *, struct commit_list *);
>  
> @@ -49,6 +50,7 @@ struct disambiguate_state {
>  
>  static void update_candidates(struct disambiguate_state *ds, const struct object_id *current)
>  {
> +	/* The hash algorithm of current has already been filtered */
>  	if (ds->always_call_fn) {
>  		ds->ambiguous = ds->fn(ds->repo, current, ds->cb_data) ? 1 : 0;
>  		return;
> @@ -134,6 +136,8 @@ static void unique_in_midx(struct multi_pack_index *m,
>  {
>  	uint32_t num, i, first = 0;
>  	const struct object_id *current = NULL;
> +	int len = ds->len > ds->repo->hash_algo->hexsz ?
> +		ds->repo->hash_algo->hexsz : ds->len;

`hexsz` is not an `int`, but a `size_t`. `match_hash()` of course uses
a third type `unsigned` instead, adding to the confusion.

>  	num = m->num_objects;
>  
>  	if (!num)
> @@ -149,7 +153,7 @@ static void unique_in_midx(struct multi_pack_index *m,
>  	for (i = first; i < num && !ds->ambiguous; i++) {
>  		struct object_id oid;
>  		current = nth_midxed_object_oid(&oid, m, i);
> -		if (!match_hash(ds->len, ds->bin_pfx.hash, current->hash))
> +		if (!match_hash(len, ds->bin_pfx.hash, current->hash))
>  			break;
>  		update_candidates(ds, current);
>  	}
> @@ -159,6 +163,8 @@ static void unique_in_pack(struct packed_git *p,
>  			   struct disambiguate_state *ds)
>  {
>  	uint32_t num, i, first = 0;
> +	int len = ds->len > ds->repo->hash_algo->hexsz ?
> +		ds->repo->hash_algo->hexsz : ds->len;
>  
>  	if (p->multi_pack_index)
>  		return;
> @@ -177,7 +183,7 @@ static void unique_in_pack(struct packed_git *p,
>  	for (i = first; i < num && !ds->ambiguous; i++) {
>  		struct object_id oid;
>  		nth_packed_object_id(&oid, p, i);
> -		if (!match_hash(ds->len, ds->bin_pfx.hash, oid.hash))
> +		if (!match_hash(len, ds->bin_pfx.hash, oid.hash))
>  			break;
>  		update_candidates(ds, &oid);
>  	}
> @@ -188,6 +194,10 @@ static void find_short_packed_object(struct disambiguate_state *ds)
>  	struct multi_pack_index *m;
>  	struct packed_git *p;
>  
> +	/* Skip, unless oids from the storage hash algorithm are wanted */
> +	if (ds->bin_pfx.algo && (&hash_algos[ds->bin_pfx.algo] != ds->repo->hash_algo))
> +		return;
> +
>  	for (m = get_multi_pack_index(ds->repo); m && !ds->ambiguous;
>  	     m = m->next)
>  		unique_in_midx(m, ds);
> @@ -326,11 +336,12 @@ int set_disambiguate_hint_config(const char *var, const char *value)
>  
>  static int init_object_disambiguation(struct repository *r,
>  				      const char *name, int len,
> +				      const struct git_hash_algo *algo,
>  				      struct disambiguate_state *ds)
>  {
>  	int i;
>  
> -	if (len < MINIMUM_ABBREV || len > the_hash_algo->hexsz)
> +	if (len < MINIMUM_ABBREV || len > GIT_MAX_HEXSZ)
>  		return -1;

Isn't this loosening things up a bit too much? I'd have expected that we
would compare with `algo->hexsz`, unless `GET_OID_HASH_ANY` is set and
thus `algo == NULL`.

Patrick

>  	memset(ds, 0, sizeof(*ds));
> @@ -357,6 +368,7 @@ static int init_object_disambiguation(struct repository *r,
>  	ds->len = len;
>  	ds->hex_pfx[len] = '\0';
>  	ds->repo = r;
> +	ds->bin_pfx.algo = algo ? hash_algo_by_ptr(algo) : GIT_HASH_UNKNOWN;
>  	prepare_alt_odb(r);
>  	return 0;
>  }
> @@ -491,9 +503,10 @@ static int repo_collect_ambiguous(struct repository *r UNUSED,
>  	return collect_ambiguous(oid, data);
>  }
>  
> -static int sort_ambiguous(const void *a, const void *b, void *ctx)
> +static int sort_ambiguous(const void *va, const void *vb, void *ctx)
>  {
>  	struct repository *sort_ambiguous_repo = ctx;
> +	const struct object_id *a = va, *b = vb;
>  	int a_type = oid_object_info(sort_ambiguous_repo, a, NULL);
>  	int b_type = oid_object_info(sort_ambiguous_repo, b, NULL);
>  	int a_type_sort;
> @@ -503,8 +516,12 @@ static int sort_ambiguous(const void *a, const void *b, void *ctx)
>  	 * Sorts by hash within the same object type, just as
>  	 * oid_array_for_each_unique() would do.
>  	 */
> -	if (a_type == b_type)
> -		return oidcmp(a, b);
> +	if (a_type == b_type) {
> +		if (a->algo == b->algo)
> +			return oidcmp(a, b);
> +		else
> +			return a->algo > b->algo ? 1 : -1;
> +	}
>  
>  	/*
>  	 * Between object types show tags, then commits, and finally
> @@ -533,8 +550,12 @@ static enum get_oid_result get_short_oid(struct repository *r,
>  	int status;
>  	struct disambiguate_state ds;
>  	int quietly = !!(flags & GET_OID_QUIETLY);
> +	const struct git_hash_algo *algo = r->hash_algo;
> +
> +	if (flags & GET_OID_HASH_ANY)
> +		algo = NULL;
>  
> -	if (init_object_disambiguation(r, name, len, &ds) < 0)
> +	if (init_object_disambiguation(r, name, len, algo, &ds) < 0)
>  		return -1;
>  
>  	if (HAS_MULTI_BITS(flags & GET_OID_DISAMBIGUATORS))
> @@ -588,7 +609,7 @@ static enum get_oid_result get_short_oid(struct repository *r,
>  		if (!ds.ambiguous)
>  			ds.fn = NULL;
>  
> -		repo_for_each_abbrev(r, ds.hex_pfx, collect_ambiguous, &collect);
> +		repo_for_each_abbrev(r, ds.hex_pfx, algo, collect_ambiguous, &collect);
>  		sort_ambiguous_oid_array(r, &collect);
>  
>  		if (oid_array_for_each(&collect, show_ambiguous_object, &out))
> @@ -610,13 +631,14 @@ static enum get_oid_result get_short_oid(struct repository *r,
>  }
>  
>  int repo_for_each_abbrev(struct repository *r, const char *prefix,
> +			 const struct git_hash_algo *algo,
>  			 each_abbrev_fn fn, void *cb_data)
>  {
>  	struct oid_array collect = OID_ARRAY_INIT;
>  	struct disambiguate_state ds;
>  	int ret;
>  
> -	if (init_object_disambiguation(r, prefix, strlen(prefix), &ds) < 0)
> +	if (init_object_disambiguation(r, prefix, strlen(prefix), algo, &ds) < 0)
>  		return -1;
>  
>  	ds.always_call_fn = 1;
> @@ -787,10 +809,12 @@ void strbuf_add_unique_abbrev(struct strbuf *sb, const struct object_id *oid,
>  int repo_find_unique_abbrev_r(struct repository *r, char *hex,
>  			      const struct object_id *oid, int len)
>  {
> +	const struct git_hash_algo *algo =
> +		oid->algo ? &hash_algos[oid->algo] : r->hash_algo;
>  	struct disambiguate_state ds;
>  	struct min_abbrev_data mad;
>  	struct object_id oid_ret;
> -	const unsigned hexsz = r->hash_algo->hexsz;
> +	const unsigned hexsz = algo->hexsz;
>  
>  	if (len < 0) {
>  		unsigned long count = repo_approximate_object_count(r);
> @@ -826,7 +850,7 @@ int repo_find_unique_abbrev_r(struct repository *r, char *hex,
>  
>  	find_abbrev_len_packed(&mad);
>  
> -	if (init_object_disambiguation(r, hex, mad.cur_len, &ds) < 0)
> +	if (init_object_disambiguation(r, hex, mad.cur_len, algo, &ds) < 0)
>  		return -1;
>  
>  	ds.fn = repo_extend_abbrev_len;
> diff --git a/object-name.h b/object-name.h
> index 9ae522307148..064ddc97d1fe 100644
> --- a/object-name.h
> +++ b/object-name.h
> @@ -67,7 +67,8 @@ enum get_oid_result get_oid_with_context(struct repository *repo, const char *st
>  
>  
>  typedef int each_abbrev_fn(const struct object_id *oid, void *);
> -int repo_for_each_abbrev(struct repository *r, const char *prefix, each_abbrev_fn, void *);
> +int repo_for_each_abbrev(struct repository *r, const char *prefix,
> +			 const struct git_hash_algo *algo, each_abbrev_fn, void *);
>  
>  int set_disambiguate_hint_config(const char *var, const char *value);
>  
> -- 
> 2.41.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 04/30] repository: add a compatibility hash algorithm
  2023-10-02  2:40   ` [PATCH v2 04/30] repository: add a compatibility hash algorithm Eric W. Biederman
  2024-02-13 10:02     ` Linus Arver
@ 2024-02-15 11:22     ` Patrick Steinhardt
  1 sibling, 0 replies; 104+ messages in thread
From: Patrick Steinhardt @ 2024-02-15 11:22 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Junio C Hamano, git, brian m. carlson, Eric Sunshine,
	Eric W. Biederman

[-- Attachment #1: Type: text/plain, Size: 4674 bytes --]

On Sun, Oct 01, 2023 at 09:40:08PM -0500, Eric W. Biederman wrote:
> From: "Eric W. Biederman" <ebiederm@xmission.com>
> 
> We currently have support for using a full stage 4 SHA-256
> implementation.

What is a "full stage 4 SHA-256 implementation"? I was assuming that you
referred to "Documentation/technical/hash-function-transition.txt", but
it does not mention stages either.

> However, we'd like to support interoperability with
> SHA-1 repositories as well.  The transition plan anticipates a
> compatibility hash algorithm configuration option that we can use to
> implement support for this.  Let's add an element to the repository
> structure that indicates the compatibility hash algorithm so we can use
> it when we need to consider interoperability between algorithms.
> 
> Add a helper function repo_set_compat_hash_algo that takes a
> compatibility hash algorithm and sets "repo->compat_hash_algo".  If
> GIT_HASH_UNKNOWN is passed as the compatibility hash algorithm
> "repo->compat_hash_algo" is set to NULL.
> 
> For now, the code results in "repo->compat_hash_algo" always being set
> to NULL, but that will change once a configuration option is added.
> 
> Inspired-by: brian m. carlson <sandals@crustytoothpaste.net>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  repository.c | 8 ++++++++
>  repository.h | 4 ++++
>  setup.c      | 3 +++
>  3 files changed, 15 insertions(+)
> 
> diff --git a/repository.c b/repository.c
> index a7679ceeaa45..80252b79e93e 100644
> --- a/repository.c
> +++ b/repository.c
> @@ -104,6 +104,13 @@ void repo_set_hash_algo(struct repository *repo, int hash_algo)
>  	repo->hash_algo = &hash_algos[hash_algo];
>  }
>  
> +void repo_set_compat_hash_algo(struct repository *repo, int algo)
> +{
> +	if (hash_algo_by_ptr(repo->hash_algo) == algo)
> +		BUG("hash_algo and compat_hash_algo match");
> +	repo->compat_hash_algo = algo ? &hash_algos[algo] : NULL;
> +}
> +
>  /*
>   * Attempt to resolve and set the provided 'gitdir' for repository 'repo'.
>   * Return 0 upon success and a non-zero value upon failure.
> @@ -184,6 +191,7 @@ int repo_init(struct repository *repo,
>  		goto error;
>  
>  	repo_set_hash_algo(repo, format.hash_algo);
> +	repo_set_compat_hash_algo(repo, GIT_HASH_UNKNOWN);
>  	repo->repository_format_worktree_config = format.worktree_config;
>  
>  	/* take ownership of format.partial_clone */
> diff --git a/repository.h b/repository.h
> index 5f18486f6465..bf3fc601cc53 100644
> --- a/repository.h
> +++ b/repository.h
> @@ -160,6 +160,9 @@ struct repository {
>  	/* Repository's current hash algorithm, as serialized on disk. */
>  	const struct git_hash_algo *hash_algo;
>  
> +	/* Repository's compatibility hash algorithm. */
> +	const struct git_hash_algo *compat_hash_algo;
> +
>  	/* A unique-id for tracing purposes. */
>  	int trace2_repo_id;
>  
> @@ -199,6 +202,7 @@ void repo_set_gitdir(struct repository *repo, const char *root,
>  		     const struct set_gitdir_args *extra_args);
>  void repo_set_worktree(struct repository *repo, const char *path);
>  void repo_set_hash_algo(struct repository *repo, int algo);
> +void repo_set_compat_hash_algo(struct repository *repo, int compat_algo);
>  void initialize_the_repository(void);
>  RESULT_MUST_BE_USED
>  int repo_init(struct repository *r, const char *gitdir, const char *worktree);
> diff --git a/setup.c b/setup.c
> index 18927a847b86..aa8bf5da5226 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -1564,6 +1564,8 @@ const char *setup_git_directory_gently(int *nongit_ok)
>  		}
>  		if (startup_info->have_repository) {
>  			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
> +			repo_set_compat_hash_algo(the_repository,
> +						  GIT_HASH_UNKNOWN);
>  			the_repository->repository_format_worktree_config =
>  				repo_fmt.worktree_config;
>  			/* take ownership of repo_fmt.partial_clone */
> @@ -1657,6 +1659,7 @@ void check_repository_format(struct repository_format *fmt)
>  	check_repository_format_gently(get_git_dir(), fmt, NULL);
>  	startup_info->have_repository = 1;
>  	repo_set_hash_algo(the_repository, fmt->hash_algo);
> +	repo_set_compat_hash_algo(the_repository, GIT_HASH_UNKNOWN);
>  	the_repository->repository_format_worktree_config =
>  		fmt->worktree_config;
>  	the_repository->repository_format_partial_clone =

There's also `init_db()`, where we call `repo_set_hash_algo()`. Would we
have to call `repo_set_compat_hash_algo()` there, too? There are some
other locations when handling remotes or clones, but I don't think those
are relevant right now.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 05/30] loose: add a mapping between SHA-1 and SHA-256 for loose objects
  2023-10-02  2:40   ` [PATCH v2 05/30] loose: add a mapping between SHA-1 and SHA-256 for loose objects Eric W. Biederman
  2024-02-14  7:20     ` Linus Arver
@ 2024-02-15 11:22     ` Patrick Steinhardt
  1 sibling, 0 replies; 104+ messages in thread
From: Patrick Steinhardt @ 2024-02-15 11:22 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Junio C Hamano, git, brian m. carlson, Eric Sunshine,
	Eric W. Biederman

[-- Attachment #1: Type: text/plain, Size: 17174 bytes --]

On Sun, Oct 01, 2023 at 09:40:09PM -0500, Eric W. Biederman wrote:
> From: "brian m. carlson" <sandals@crustytoothpaste.net>
> 
> As part of the transition plan, we'd like to add a file in the .git
> directory that maps loose objects between SHA-1 and SHA-256.  Let's
> implement the specification in the transition plan and store this data
> on a per-repository basis in struct repository.
> 
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  Makefile              |   1 +
>  loose.c               | 246 ++++++++++++++++++++++++++++++++++++++++++
>  loose.h               |  22 ++++
>  object-file-convert.c |  14 ++-
>  object-store-ll.h     |   3 +
>  object.c              |   2 +
>  repository.c          |   6 ++
>  7 files changed, 293 insertions(+), 1 deletion(-)
>  create mode 100644 loose.c
>  create mode 100644 loose.h
> 
> diff --git a/Makefile b/Makefile
> index f7e824f25cda..3c18664def9a 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1053,6 +1053,7 @@ LIB_OBJS += list-objects-filter.o
>  LIB_OBJS += list-objects.o
>  LIB_OBJS += lockfile.o
>  LIB_OBJS += log-tree.o
> +LIB_OBJS += loose.o
>  LIB_OBJS += ls-refs.o
>  LIB_OBJS += mailinfo.o
>  LIB_OBJS += mailmap.o
> diff --git a/loose.c b/loose.c
> new file mode 100644
> index 000000000000..6ba73cc84dca
> --- /dev/null
> +++ b/loose.c

When reading "loose" I immediately think about loose objects, only. I
would not consider this about mapping object IDs, which I expect would
also happen for packed objects?

It very much seems like you explicitly only care about loose objects in
the code here, which is weird to me. If that is in fact intentional
because we learn to store the compat object hash in pack files over the
course of this patch seires then it would make sense to explain this a
bit more in depth.

> @@ -0,0 +1,246 @@
> +#include "git-compat-util.h"
> +#include "hash.h"
> +#include "path.h"
> +#include "object-store.h"
> +#include "hex.h"
> +#include "wrapper.h"
> +#include "gettext.h"
> +#include "loose.h"
> +#include "lockfile.h"
> +
> +static const char *loose_object_header = "# loose-object-idx\n";
> +
> +static inline int should_use_loose_object_map(struct repository *repo)
> +{
> +	return repo->compat_hash_algo && repo->gitdir;
> +}
> +
> +void loose_object_map_init(struct loose_object_map **map)
> +{
> +	struct loose_object_map *m;
> +	m = xmalloc(sizeof(**map));
> +	m->to_compat = kh_init_oid_map();
> +	m->to_storage = kh_init_oid_map();
> +	*map = m;
> +}
> +
> +static int insert_oid_pair(kh_oid_map_t *map, const struct object_id *key, const struct object_id *value)
> +{
> +	khiter_t pos;
> +	int ret;
> +	struct object_id *stored;
> +
> +	pos = kh_put_oid_map(map, *key, &ret);
> +
> +	/* This item already exists in the map. */
> +	if (ret == 0)
> +		return 0;

Should we safeguard this and compare whether the key's value matches the
passed-in value? One of the more general themes that I'm worried about
is what happens when we hit hash collisions (e.g. two objects mapping to
the same SHA1, but different SHA256 hashes), and safeguarding us against
this possibility feels sensible to me.

> +	stored = xmalloc(sizeof(*stored));
> +	oidcpy(stored, value);
> +	kh_value(map, pos) = stored;
> +	return 1;
> +}
> +
> +static int load_one_loose_object_map(struct repository *repo, struct object_directory *dir)
> +{
> +	struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT;
> +	FILE *fp;
> +
> +	if (!dir->loose_map)
> +		loose_object_map_init(&dir->loose_map);
> +
> +	insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->empty_tree, repo->compat_hash_algo->empty_tree);
> +	insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->empty_tree, repo->hash_algo->empty_tree);
> +
> +	insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->empty_blob, repo->compat_hash_algo->empty_blob);
> +	insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->empty_blob, repo->hash_algo->empty_blob);
> +
> +	insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->null_oid, repo->compat_hash_algo->null_oid);
> +	insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->null_oid, repo->hash_algo->null_oid);
> +
> +	strbuf_git_common_path(&path, repo, "objects/loose-object-idx");
> +	fp = fopen(path.buf, "rb");
> +	if (!fp) {
> +		strbuf_release(&path);
> +		return 0;

I think we should discern ENOENT from other errors. Failing gracefully
when the file doesn't exist may be sensible, but not when we failed due
to something like an I/O error.

> +	}
> +
> +	errno = 0;
> +	if (strbuf_getwholeline(&buf, fp, '\n') || strcmp(buf.buf, loose_object_header))
> +		goto err;
> +	while (!strbuf_getline_lf(&buf, fp)) {
> +		const char *p;
> +		struct object_id oid, compat_oid;
> +		if (parse_oid_hex_algop(buf.buf, &oid, &p, repo->hash_algo) ||
> +		    *p++ != ' ' ||
> +		    parse_oid_hex_algop(p, &compat_oid, &p, repo->compat_hash_algo) ||
> +		    p != buf.buf + buf.len)
> +			goto err;
> +		insert_oid_pair(dir->loose_map->to_compat, &oid, &compat_oid);
> +		insert_oid_pair(dir->loose_map->to_storage, &compat_oid, &oid);
> +	}

Is the actual format specified anywhere? I have to wonder about the
scalability of such a format that uses a simple line-based format for
every object ID. Two main concerns:

  1. If the format is unsorted and we simply append to it whenever the
     repo gains new objects then we are forced to always load the
     complete map into memory. This would be quite inefficient in larger
     repositories that have millions of objects. Every line contains two
     object hashes as well as two whitespace characters, which amounts
     to `(2 + 40 + 64) * $numobjects` many bytes.

     For linux.git with more than 10 million objects, the map would thus
     be around 1GB in size. Loading that into memory and converting it
     into maps feels prohibitively expensive to me.

  2. If the format was sorted then we could perform binary searches
     inside the format to look up object IDs because we know that each
     line has a fixed length. On the other hand, adding new objects
     would require us to rewrite the whole file every time.

I think loading the complete object map into memory is simply too
expensive in any larger "real-world" repository. But rewriting a sorted
file format every time we add new objects feels sufficiently expensive,
too. Neither of these properties sounds like it would be feasible to use
for larger Git hosting platforms. So I think we should put some more
thought into this.

Some proposals:

  - We shouldn't store hex characters but raw object IDs, thus reducing
    the size of the file by almost half.

  - We should store the file sorted so that we can avoid loading it into
    memory and do binary searches.

  - We might grow this into a "stack" of object maps so that it becomes
    easier to add new objects to the map without having to rewrite it
    every time. With geometric repacking this should be somewhat
    manageable.

We don't have to do all of this right from the beginning, I just want to
start the discussion around this.

> +	strbuf_release(&buf);
> +	strbuf_release(&path);
> +	return errno ? -1 : 0;

It feels quite fragile to me to check for `errno` in this way. Should we
instead check `ferror(fp)`?

> +err:
> +	strbuf_release(&buf);
> +	strbuf_release(&path);
> +	return -1;
> +}

We could deduplicate the error paths by storing the return value into an
`int ret`.

> +int repo_read_loose_object_map(struct repository *repo)
> +{
> +	struct object_directory *dir;
> +
> +	if (!should_use_loose_object_map(repo))
> +		return 0;
> +
> +	prepare_alt_odb(repo);
> +
> +	for (dir = repo->objects->odb; dir; dir = dir->next) {
> +		if (load_one_loose_object_map(repo, dir) < 0) {
> +			return -1;
> +		}
> +	}

The braces here are not needed.

> +	return 0;
> +}
> +
> +int repo_write_loose_object_map(struct repository *repo)
> +{
> +	kh_oid_map_t *map = repo->objects->odb->loose_map->to_compat;
> +	struct lock_file lock;
> +	int fd;
> +	khiter_t iter;
> +	struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT;
> +
> +	if (!should_use_loose_object_map(repo))
> +		return 0;
> +
> +	strbuf_git_common_path(&path, repo, "objects/loose-object-idx");
> +	fd = hold_lock_file_for_update_timeout(&lock, path.buf, LOCK_DIE_ON_ERROR, -1);
> +	iter = kh_begin(map);
> +	if (write_in_full(fd, loose_object_header, strlen(loose_object_header)) < 0)
> +		goto errout;
> +
> +	for (; iter != kh_end(map); iter++) {
> +		if (kh_exist(map, iter)) {
> +			if (oideq(&kh_key(map, iter), the_hash_algo->empty_tree) ||
> +			    oideq(&kh_key(map, iter), the_hash_algo->empty_blob))
> +				continue;
> +			strbuf_addf(&buf, "%s %s\n", oid_to_hex(&kh_key(map, iter)), oid_to_hex(kh_value(map, iter)));
> +			if (write_in_full(fd, buf.buf, buf.len) < 0)
> +				goto errout;
> +			strbuf_reset(&buf);
> +		}
> +	}
> +	strbuf_release(&buf);
> +	if (commit_lock_file(&lock) < 0) {
> +		error_errno(_("could not write loose object index %s"), path.buf);
> +		strbuf_release(&path);
> +		return -1;
> +	}
> +	strbuf_release(&path);
> +	return 0;
> +errout:
> +	rollback_lock_file(&lock);
> +	strbuf_release(&buf);
> +	error_errno(_("failed to write loose object index %s\n"), path.buf);
> +	strbuf_release(&path);
> +	return -1;

Same here, we should be able to combine cleanup of both the successful
and error paths. It's safe to call `rollback_lock_file()` even if the
file has already been committed.

> +}
> +
> +static int write_one_object(struct repository *repo, const struct object_id *oid,
> +			    const struct object_id *compat_oid)
> +{
> +	struct lock_file lock;
> +	int fd;
> +	struct stat st;
> +	struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT;
> +
> +	strbuf_git_common_path(&path, repo, "objects/loose-object-idx");
> +	hold_lock_file_for_update_timeout(&lock, path.buf, LOCK_DIE_ON_ERROR, -1);
> +
> +	fd = open(path.buf, O_WRONLY | O_CREAT | O_APPEND, 0666);
> +	if (fd < 0)
> +		goto errout;
> +	if (fstat(fd, &st) < 0)
> +		goto errout;
> +	if (!st.st_size && write_in_full(fd, loose_object_header, strlen(loose_object_header)) < 0)
> +		goto errout;
> +
> +	strbuf_addf(&buf, "%s %s\n", oid_to_hex(oid), oid_to_hex(compat_oid));
> +	if (write_in_full(fd, buf.buf, buf.len) < 0)
> +		goto errout;
> +	if (close(fd))
> +		goto errout;

It's not safe to update the file in-place like this. A concurrent reader
may end up seeing partial lines and error out. Also, if we were to crash
we might easily end up with a corrupted mapping file.

> +	adjust_shared_perm(path.buf);
> +	rollback_lock_file(&lock);
> +	strbuf_release(&buf);
> +	strbuf_release(&path);
> +	return 0;
> +errout:
> +	error_errno(_("failed to write loose object index %s\n"), path.buf);
> +	close(fd);
> +	rollback_lock_file(&lock);
> +	strbuf_release(&buf);
> +	strbuf_release(&path);
> +	return -1;

Same.

> +}
> +
> +int repo_add_loose_object_map(struct repository *repo, const struct object_id *oid,
> +			      const struct object_id *compat_oid)
> +{
> +	int inserted = 0;
> +
> +	if (!should_use_loose_object_map(repo))
> +		return 0;
> +
> +	inserted |= insert_oid_pair(repo->objects->odb->loose_map->to_compat, oid, compat_oid);
> +	inserted |= insert_oid_pair(repo->objects->odb->loose_map->to_storage, compat_oid, oid);
> +	if (inserted)
> +		return write_one_object(repo, oid, compat_oid);
> +	return 0;
> +}
> +
> +int repo_loose_object_map_oid(struct repository *repo,
> +			      const struct object_id *src,
> +			      const struct git_hash_algo *to,
> +			      struct object_id *dest)
> +{
> +	struct object_directory *dir;
> +	kh_oid_map_t *map;
> +	khiter_t pos;
> +
> +	for (dir = repo->objects->odb; dir; dir = dir->next) {
> +		struct loose_object_map *loose_map = dir->loose_map;
> +		if (!loose_map)
> +			continue;
> +		map = (to == repo->compat_hash_algo) ?
> +			loose_map->to_compat :
> +			loose_map->to_storage;
> +		pos = kh_get_oid_map(map, *src);
> +		if (pos < kh_end(map)) {
> +			oidcpy(dest, kh_value(map, pos));
> +			return 0;
> +		}
> +	}
> +	return -1;
> +}
> +
> +void loose_object_map_clear(struct loose_object_map **map)

Nit: I'd rather call it `loose_object_map_release()`. `clear` typically
indicates that we clear contents, but do not end up freeing the
containing structure.

> +{
> +	struct loose_object_map *m = *map;
> +	struct object_id *oid;
> +
> +	if (!m)
> +		return;
> +
> +	kh_foreach_value(m->to_compat, oid, free(oid));
> +	kh_foreach_value(m->to_storage, oid, free(oid));
> +	kh_destroy_oid_map(m->to_compat);
> +	kh_destroy_oid_map(m->to_storage);
> +	free(m);
> +	*map = NULL;
> +}
> diff --git a/loose.h b/loose.h
> new file mode 100644
> index 000000000000..2c2957072c5f
> --- /dev/null
> +++ b/loose.h
> @@ -0,0 +1,22 @@
> +#ifndef LOOSE_H
> +#define LOOSE_H
> +
> +#include "khash.h"
> +
> +struct loose_object_map {
> +	kh_oid_map_t *to_compat;
> +	kh_oid_map_t *to_storage;
> +};

Any specific reason why you don't use `struct oidmap` here?

Patrick

> +void loose_object_map_init(struct loose_object_map **map);
> +void loose_object_map_clear(struct loose_object_map **map);
> +int repo_loose_object_map_oid(struct repository *repo,
> +			      const struct object_id *src,
> +			      const struct git_hash_algo *dest_algo,
> +			      struct object_id *dest);
> +int repo_add_loose_object_map(struct repository *repo, const struct object_id *oid,
> +			      const struct object_id *compat_oid);
> +int repo_read_loose_object_map(struct repository *repo);
> +int repo_write_loose_object_map(struct repository *repo);
> +
> +#endif
> diff --git a/object-file-convert.c b/object-file-convert.c
> index 4777aba83636..1ec945eaa17f 100644
> --- a/object-file-convert.c
> +++ b/object-file-convert.c
> @@ -4,6 +4,7 @@
>  #include "repository.h"
>  #include "hash-ll.h"
>  #include "object.h"
> +#include "loose.h"
>  #include "object-file-convert.h"
>  
>  int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
> @@ -21,7 +22,18 @@ int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
>  			oidcpy(dest, src);
>  		return 0;
>  	}
> -	return -1;
> +	if (repo_loose_object_map_oid(repo, src, to, dest)) {
> +		/*
> +		 * We may have loaded the object map at repo initialization but
> +		 * another process (perhaps upstream of a pipe from us) may have
> +		 * written a new object into the map.  If the object is missing,
> +		 * let's reload the map to see if the object has appeared.
> +		 */
> +		repo_read_loose_object_map(repo);
> +		if (repo_loose_object_map_oid(repo, src, to, dest))
> +			return -1;
> +	}
> +	return 0;
>  }
>  
>  int convert_object_file(struct strbuf *outbuf,
> diff --git a/object-store-ll.h b/object-store-ll.h
> index 26a3895c821c..bc76d6bec80d 100644
> --- a/object-store-ll.h
> +++ b/object-store-ll.h
> @@ -26,6 +26,9 @@ struct object_directory {
>  	uint32_t loose_objects_subdir_seen[8]; /* 256 bits */
>  	struct oidtree *loose_objects_cache;
>  
> +	/* Map between object IDs for loose objects. */
> +	struct loose_object_map *loose_map;
> +
>  	/*
>  	 * This is a temporary object store created by the tmp_objdir
>  	 * facility. Disable ref updates since the objects in the store
> diff --git a/object.c b/object.c
> index 2c61e4c86217..186a0a47c0fb 100644
> --- a/object.c
> +++ b/object.c
> @@ -13,6 +13,7 @@
>  #include "alloc.h"
>  #include "packfile.h"
>  #include "commit-graph.h"
> +#include "loose.h"
>  
>  unsigned int get_max_object_index(void)
>  {
> @@ -540,6 +541,7 @@ void free_object_directory(struct object_directory *odb)
>  {
>  	free(odb->path);
>  	odb_clear_loose_cache(odb);
> +	loose_object_map_clear(&odb->loose_map);
>  	free(odb);
>  }
>  
> diff --git a/repository.c b/repository.c
> index 80252b79e93e..6214f61cf4e7 100644
> --- a/repository.c
> +++ b/repository.c
> @@ -14,6 +14,7 @@
>  #include "read-cache-ll.h"
>  #include "remote.h"
>  #include "setup.h"
> +#include "loose.h"
>  #include "submodule-config.h"
>  #include "sparse-index.h"
>  #include "trace2.h"
> @@ -109,6 +110,8 @@ void repo_set_compat_hash_algo(struct repository *repo, int algo)
>  	if (hash_algo_by_ptr(repo->hash_algo) == algo)
>  		BUG("hash_algo and compat_hash_algo match");
>  	repo->compat_hash_algo = algo ? &hash_algos[algo] : NULL;
> +	if (repo->compat_hash_algo)
> +		repo_read_loose_object_map(repo);
>  }
>  
>  /*
> @@ -201,6 +204,9 @@ int repo_init(struct repository *repo,
>  	if (worktree)
>  		repo_set_worktree(repo, worktree);
>  
> +	if (repo->compat_hash_algo)
> +		repo_read_loose_object_map(repo);
> +
>  	clear_repository_format(&format);
>  	return 0;
>  
> -- 
> 2.41.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 06/30] loose: compatibilty short name support
  2023-10-02  2:40   ` [PATCH v2 06/30] loose: compatibilty short name support Eric W. Biederman
@ 2024-02-15 11:22     ` Patrick Steinhardt
  0 siblings, 0 replies; 104+ messages in thread
From: Patrick Steinhardt @ 2024-02-15 11:22 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Junio C Hamano, git, brian m. carlson, Eric Sunshine,
	Eric W. Biederman

[-- Attachment #1: Type: text/plain, Size: 4497 bytes --]

On Sun, Oct 01, 2023 at 09:40:10PM -0500, Eric W. Biederman wrote:
> From: "Eric W. Biederman" <ebiederm@xmission.com>
> 
> Update loose_objects_cache when udpating the loose objects map.  This
> oidtree is used to discover which oids are possibilities when
> resolving short names, and it can support a mixture of sha1
> and sha256 oids.
> 
> With this any oid recorded objects/loose-objects-idx is usable
> for resolving an oid to an object.
> 
> To make this maintainable a helper insert_loose_map is factored
> out of load_one_loose_object_map and repo_add_loose_object_map,
> and then modified to also update the loose_objects_cache.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  loose.c | 37 +++++++++++++++++++++++++------------
>  1 file changed, 25 insertions(+), 12 deletions(-)
> 
> diff --git a/loose.c b/loose.c
> index 6ba73cc84dca..f6faa6216a08 100644
> --- a/loose.c
> +++ b/loose.c
> @@ -7,6 +7,7 @@
>  #include "gettext.h"
>  #include "loose.h"
>  #include "lockfile.h"
> +#include "oidtree.h"
>  
>  static const char *loose_object_header = "# loose-object-idx\n";
>  
> @@ -42,6 +43,21 @@ static int insert_oid_pair(kh_oid_map_t *map, const struct object_id *key, const
>  	return 1;
>  }
>  
> +static int insert_loose_map(struct object_directory *odb,
> +			    const struct object_id *oid,
> +			    const struct object_id *compat_oid)

I think it would've been nice to fold this into the preceding patch
already. At least I wanted to propose adding such a function to avoid
the duplication down below.

Patrick

> +{
> +	struct loose_object_map *map = odb->loose_map;
> +	int inserted = 0;
> +
> +	inserted |= insert_oid_pair(map->to_compat, oid, compat_oid);
> +	inserted |= insert_oid_pair(map->to_storage, compat_oid, oid);
> +	if (inserted)
> +		oidtree_insert(odb->loose_objects_cache, compat_oid);
> +
> +	return inserted;
> +}
> +
>  static int load_one_loose_object_map(struct repository *repo, struct object_directory *dir)
>  {
>  	struct strbuf buf = STRBUF_INIT, path = STRBUF_INIT;
> @@ -49,15 +65,14 @@ static int load_one_loose_object_map(struct repository *repo, struct object_dire
>  
>  	if (!dir->loose_map)
>  		loose_object_map_init(&dir->loose_map);
> +	if (!dir->loose_objects_cache) {
> +		ALLOC_ARRAY(dir->loose_objects_cache, 1);
> +		oidtree_init(dir->loose_objects_cache);
> +	}
>  
> -	insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->empty_tree, repo->compat_hash_algo->empty_tree);
> -	insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->empty_tree, repo->hash_algo->empty_tree);
> -
> -	insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->empty_blob, repo->compat_hash_algo->empty_blob);
> -	insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->empty_blob, repo->hash_algo->empty_blob);
> -
> -	insert_oid_pair(dir->loose_map->to_compat, repo->hash_algo->null_oid, repo->compat_hash_algo->null_oid);
> -	insert_oid_pair(dir->loose_map->to_storage, repo->compat_hash_algo->null_oid, repo->hash_algo->null_oid);
> +	insert_loose_map(dir, repo->hash_algo->empty_tree, repo->compat_hash_algo->empty_tree);
> +	insert_loose_map(dir, repo->hash_algo->empty_blob, repo->compat_hash_algo->empty_blob);
> +	insert_loose_map(dir, repo->hash_algo->null_oid, repo->compat_hash_algo->null_oid);
>  
>  	strbuf_git_common_path(&path, repo, "objects/loose-object-idx");
>  	fp = fopen(path.buf, "rb");
> @@ -77,8 +92,7 @@ static int load_one_loose_object_map(struct repository *repo, struct object_dire
>  		    parse_oid_hex_algop(p, &compat_oid, &p, repo->compat_hash_algo) ||
>  		    p != buf.buf + buf.len)
>  			goto err;
> -		insert_oid_pair(dir->loose_map->to_compat, &oid, &compat_oid);
> -		insert_oid_pair(dir->loose_map->to_storage, &compat_oid, &oid);
> +		insert_loose_map(dir, &oid, &compat_oid);
>  	}
>  
>  	strbuf_release(&buf);
> @@ -197,8 +211,7 @@ int repo_add_loose_object_map(struct repository *repo, const struct object_id *o
>  	if (!should_use_loose_object_map(repo))
>  		return 0;
>  
> -	inserted |= insert_oid_pair(repo->objects->odb->loose_map->to_compat, oid, compat_oid);
> -	inserted |= insert_oid_pair(repo->objects->odb->loose_map->to_storage, compat_oid, oid);
> +	inserted = insert_loose_map(repo->objects->odb, oid, compat_oid);
>  	if (inserted)
>  		return write_one_object(repo, oid, compat_oid);
>  	return 0;
> -- 
> 2.41.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 07/30] object-file: update the loose object map when writing loose objects
  2023-10-02  2:40   ` [PATCH v2 07/30] object-file: update the loose object map when writing loose objects Eric W. Biederman
@ 2024-02-15 11:22     ` Patrick Steinhardt
  0 siblings, 0 replies; 104+ messages in thread
From: Patrick Steinhardt @ 2024-02-15 11:22 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Junio C Hamano, git, brian m. carlson, Eric Sunshine,
	Eric W. Biederman

[-- Attachment #1: Type: text/plain, Size: 14239 bytes --]

On Sun, Oct 01, 2023 at 09:40:11PM -0500, Eric W. Biederman wrote:
> From: "Eric W. Biederman" <ebiederm@xmission.com>
> 
> To implement SHA1 compatibility on SHA256 repositories the loose
> object map needs to be updated whenver a loose object is written.

Not only when loose objects are written, but also when packfiles are
written e.g. when accepting a push via git-receive-pack(1). Basically,
whenever an object gets written into the main object database.

This also brings up another interesting angle: how will this work in the
context of alternate object directories? We have no control over new
objects being written into those, and thus the object mapping that we
have in our satellite repository that uses the alternate would be out of
date.

I think this is another indicator that stacking might be the right way
to go. Like that, the stack of object maps would be the main stack plus
all stack of alternates concatenated. Finding a mapping would then have
to go through all of these maps to find the desired object.

> Updating the loose object map this way allows git to support
> the old hash algorithm in constant time.

As mentioned before, appending objects is constant-time, but the reading
side is unfortunately not. It's probably more something like `O(nlogn)`
because we have to load all objects and add each of the objects into the
map, which I expect to be `O(logn)`. So the reading time isn't even
linear.

Patrick

> The functions write_loose_object, and stream_loose_object are
> the only two functions that write to the loose object store.
> 
> Update stream_loose_object to compute the compatibiilty hash, update
> the loose object, and then call repo_add_loose_object_map to update
> the loose object map.
> 
> Update write_object_file_flags to convert the object into
> it's compatibility encoding, hash the compatibility encoding,
> write the object, and then update the loose object map.
> 
> Update force_object_loose to lookup the hash of the compatibility
> encoding, write the loose object, and then update the loose object
> map.
> 
> Update write_object_file_literally to convert the object into it's
> compatibility hash encoding, hash the compatibility enconding, write
> the object, and then update the loose object map, when the type string
> is a known type.  For objects with an unknown type this results in a
> partially broken repository, as the objects are not mapped.
> 
> The point of write_object_file_literally is to generate a partially
> broken repository for testing.  For testing skipping writing the loose
> object map is much more useful than refusing to write the broken
> object at all.
> 
> Except that the loose objects are updated before the loose object map
> I have not done any analysis to see how robust this scheme is in the
> event of failure.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  object-file.c | 113 ++++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 95 insertions(+), 18 deletions(-)
> 
> diff --git a/object-file.c b/object-file.c
> index 7dc0c4bfbba8..4e55f475b3b4 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -43,6 +43,8 @@
>  #include "setup.h"
>  #include "submodule.h"
>  #include "fsck.h"
> +#include "loose.h"
> +#include "object-file-convert.h"
>  
>  /* The maximum size for an object header. */
>  #define MAX_HEADER_LEN 32
> @@ -1952,9 +1954,12 @@ static int start_loose_object_common(struct strbuf *tmp_file,
>  				     const char *filename, unsigned flags,
>  				     git_zstream *stream,
>  				     unsigned char *buf, size_t buflen,
> -				     git_hash_ctx *c,
> +				     git_hash_ctx *c, git_hash_ctx *compat_c,
>  				     char *hdr, int hdrlen)
>  {
> +	struct repository *repo = the_repository;
> +	const struct git_hash_algo *algo = repo->hash_algo;
> +	const struct git_hash_algo *compat = repo->compat_hash_algo;
>  	int fd;
>  
>  	fd = create_tmpfile(tmp_file, filename);
> @@ -1974,14 +1979,18 @@ static int start_loose_object_common(struct strbuf *tmp_file,
>  	git_deflate_init(stream, zlib_compression_level);
>  	stream->next_out = buf;
>  	stream->avail_out = buflen;
> -	the_hash_algo->init_fn(c);
> +	algo->init_fn(c);
> +	if (compat && compat_c)
> +		compat->init_fn(compat_c);
>  
>  	/*  Start to feed header to zlib stream */
>  	stream->next_in = (unsigned char *)hdr;
>  	stream->avail_in = hdrlen;
>  	while (git_deflate(stream, 0) == Z_OK)
>  		; /* nothing */
> -	the_hash_algo->update_fn(c, hdr, hdrlen);
> +	algo->update_fn(c, hdr, hdrlen);
> +	if (compat && compat_c)
> +		compat->update_fn(compat_c, hdr, hdrlen);
>  
>  	return fd;
>  }
> @@ -1990,16 +1999,21 @@ static int start_loose_object_common(struct strbuf *tmp_file,
>   * Common steps for the inner git_deflate() loop for writing loose
>   * objects. Returns what git_deflate() returns.
>   */
> -static int write_loose_object_common(git_hash_ctx *c,
> +static int write_loose_object_common(git_hash_ctx *c, git_hash_ctx *compat_c,
>  				     git_zstream *stream, const int flush,
>  				     unsigned char *in0, const int fd,
>  				     unsigned char *compressed,
>  				     const size_t compressed_len)
>  {
> +	struct repository *repo = the_repository;
> +	const struct git_hash_algo *algo = repo->hash_algo;
> +	const struct git_hash_algo *compat = repo->compat_hash_algo;
>  	int ret;
>  
>  	ret = git_deflate(stream, flush ? Z_FINISH : 0);
> -	the_hash_algo->update_fn(c, in0, stream->next_in - in0);
> +	algo->update_fn(c, in0, stream->next_in - in0);
> +	if (compat && compat_c)
> +		compat->update_fn(compat_c, in0, stream->next_in - in0);
>  	if (write_in_full(fd, compressed, stream->next_out - compressed) < 0)
>  		die_errno(_("unable to write loose object file"));
>  	stream->next_out = compressed;
> @@ -2014,15 +2028,21 @@ static int write_loose_object_common(git_hash_ctx *c,
>   * - End the compression of zlib stream.
>   * - Get the calculated oid to "oid".
>   */
> -static int end_loose_object_common(git_hash_ctx *c, git_zstream *stream,
> -				   struct object_id *oid)
> +static int end_loose_object_common(git_hash_ctx *c, git_hash_ctx *compat_c,
> +				   git_zstream *stream, struct object_id *oid,
> +				   struct object_id *compat_oid)
>  {
> +	struct repository *repo = the_repository;
> +	const struct git_hash_algo *algo = repo->hash_algo;
> +	const struct git_hash_algo *compat = repo->compat_hash_algo;
>  	int ret;
>  
>  	ret = git_deflate_end_gently(stream);
>  	if (ret != Z_OK)
>  		return ret;
> -	the_hash_algo->final_oid_fn(oid, c);
> +	algo->final_oid_fn(oid, c);
> +	if (compat && compat_c)
> +		compat->final_oid_fn(compat_oid, compat_c);
>  
>  	return Z_OK;
>  }
> @@ -2046,7 +2066,7 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
>  
>  	fd = start_loose_object_common(&tmp_file, filename.buf, flags,
>  				       &stream, compressed, sizeof(compressed),
> -				       &c, hdr, hdrlen);
> +				       &c, NULL, hdr, hdrlen);
>  	if (fd < 0)
>  		return -1;
>  
> @@ -2056,14 +2076,14 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
>  	do {
>  		unsigned char *in0 = stream.next_in;
>  
> -		ret = write_loose_object_common(&c, &stream, 1, in0, fd,
> +		ret = write_loose_object_common(&c, NULL, &stream, 1, in0, fd,
>  						compressed, sizeof(compressed));
>  	} while (ret == Z_OK);
>  
>  	if (ret != Z_STREAM_END)
>  		die(_("unable to deflate new object %s (%d)"), oid_to_hex(oid),
>  		    ret);
> -	ret = end_loose_object_common(&c, &stream, &parano_oid);
> +	ret = end_loose_object_common(&c, NULL, &stream, &parano_oid, NULL);
>  	if (ret != Z_OK)
>  		die(_("deflateEnd on object %s failed (%d)"), oid_to_hex(oid),
>  		    ret);
> @@ -2108,10 +2128,12 @@ static int freshen_packed_object(const struct object_id *oid)
>  int stream_loose_object(struct input_stream *in_stream, size_t len,
>  			struct object_id *oid)
>  {
> +	const struct git_hash_algo *compat = the_repository->compat_hash_algo;
> +	struct object_id compat_oid;
>  	int fd, ret, err = 0, flush = 0;
>  	unsigned char compressed[4096];
>  	git_zstream stream;
> -	git_hash_ctx c;
> +	git_hash_ctx c, compat_c;
>  	struct strbuf tmp_file = STRBUF_INIT;
>  	struct strbuf filename = STRBUF_INIT;
>  	int dirlen;
> @@ -2135,7 +2157,7 @@ int stream_loose_object(struct input_stream *in_stream, size_t len,
>  	 */
>  	fd = start_loose_object_common(&tmp_file, filename.buf, 0,
>  				       &stream, compressed, sizeof(compressed),
> -				       &c, hdr, hdrlen);
> +				       &c, &compat_c, hdr, hdrlen);
>  	if (fd < 0) {
>  		err = -1;
>  		goto cleanup;
> @@ -2153,7 +2175,7 @@ int stream_loose_object(struct input_stream *in_stream, size_t len,
>  			if (in_stream->is_finished)
>  				flush = 1;
>  		}
> -		ret = write_loose_object_common(&c, &stream, flush, in0, fd,
> +		ret = write_loose_object_common(&c, &compat_c, &stream, flush, in0, fd,
>  						compressed, sizeof(compressed));
>  		/*
>  		 * Unlike write_loose_object(), we do not have the entire
> @@ -2176,7 +2198,7 @@ int stream_loose_object(struct input_stream *in_stream, size_t len,
>  	 */
>  	if (ret != Z_STREAM_END)
>  		die(_("unable to stream deflate new object (%d)"), ret);
> -	ret = end_loose_object_common(&c, &stream, oid);
> +	ret = end_loose_object_common(&c, &compat_c, &stream, oid, &compat_oid);
>  	if (ret != Z_OK)
>  		die(_("deflateEnd on stream object failed (%d)"), ret);
>  	close_loose_object(fd, tmp_file.buf);
> @@ -2203,6 +2225,8 @@ int stream_loose_object(struct input_stream *in_stream, size_t len,
>  	}
>  
>  	err = finalize_object_file(tmp_file.buf, filename.buf);
> +	if (!err && compat)
> +		err = repo_add_loose_object_map(the_repository, oid, &compat_oid);
>  cleanup:
>  	strbuf_release(&tmp_file);
>  	strbuf_release(&filename);
> @@ -2213,17 +2237,38 @@ int write_object_file_flags(const void *buf, unsigned long len,
>  			    enum object_type type, struct object_id *oid,
>  			    unsigned flags)
>  {
> +	struct repository *repo = the_repository;
> +	const struct git_hash_algo *algo = repo->hash_algo;
> +	const struct git_hash_algo *compat = repo->compat_hash_algo;
> +	struct object_id compat_oid;
>  	char hdr[MAX_HEADER_LEN];
>  	int hdrlen = sizeof(hdr);
>  
> +	/* Generate compat_oid */
> +	if (compat) {
> +		if (type == OBJ_BLOB)
> +			hash_object_file(compat, buf, len, type, &compat_oid);
> +		else {
> +			struct strbuf converted = STRBUF_INIT;
> +			convert_object_file(&converted, algo, compat,
> +					    buf, len, type, 0);
> +			hash_object_file(compat, converted.buf, converted.len,
> +					 type, &compat_oid);
> +			strbuf_release(&converted);
> +		}
> +	}
> +
>  	/* Normally if we have it in the pack then we do not bother writing
>  	 * it out into .git/objects/??/?{38} file.
>  	 */
> -	write_object_file_prepare(the_hash_algo, buf, len, type, oid, hdr,
> -				  &hdrlen);
> +	write_object_file_prepare(algo, buf, len, type, oid, hdr, &hdrlen);
>  	if (freshen_packed_object(oid) || freshen_loose_object(oid))
>  		return 0;
> -	return write_loose_object(oid, hdr, hdrlen, buf, len, 0, flags);
> +	if (write_loose_object(oid, hdr, hdrlen, buf, len, 0, flags))
> +		return -1;
> +	if (compat)
> +		return repo_add_loose_object_map(repo, oid, &compat_oid);
> +	return 0;
>  }
>  
>  int write_object_file_literally(const void *buf, unsigned long len,
> @@ -2231,7 +2276,27 @@ int write_object_file_literally(const void *buf, unsigned long len,
>  				unsigned flags)
>  {
>  	char *header;
> +	struct repository *repo = the_repository;
> +	const struct git_hash_algo *algo = repo->hash_algo;
> +	const struct git_hash_algo *compat = repo->compat_hash_algo;
> +	struct object_id compat_oid;
>  	int hdrlen, status = 0;
> +	int compat_type = -1;
> +
> +	if (compat) {
> +		compat_type = type_from_string_gently(type, -1, 1);
> +		if (compat_type == OBJ_BLOB)
> +			hash_object_file(compat, buf, len, compat_type,
> +					 &compat_oid);
> +		else if (compat_type != -1) {
> +			struct strbuf converted = STRBUF_INIT;
> +			convert_object_file(&converted, algo, compat,
> +					    buf, len, compat_type, 0);
> +			hash_object_file(compat, converted.buf, converted.len,
> +					 compat_type, &compat_oid);
> +			strbuf_release(&converted);
> +		}
> +	}
>  
>  	/* type string, SP, %lu of the length plus NUL must fit this */
>  	hdrlen = strlen(type) + MAX_HEADER_LEN;
> @@ -2244,6 +2309,8 @@ int write_object_file_literally(const void *buf, unsigned long len,
>  	if (freshen_packed_object(oid) || freshen_loose_object(oid))
>  		goto cleanup;
>  	status = write_loose_object(oid, header, hdrlen, buf, len, 0, 0);
> +	if (compat_type != -1)
> +		return repo_add_loose_object_map(repo, oid, &compat_oid);
>  
>  cleanup:
>  	free(header);
> @@ -2252,9 +2319,12 @@ int write_object_file_literally(const void *buf, unsigned long len,
>  
>  int force_object_loose(const struct object_id *oid, time_t mtime)
>  {
> +	struct repository *repo = the_repository;
> +	const struct git_hash_algo *compat = repo->compat_hash_algo;
>  	void *buf;
>  	unsigned long len;
>  	struct object_info oi = OBJECT_INFO_INIT;
> +	struct object_id compat_oid;
>  	enum object_type type;
>  	char hdr[MAX_HEADER_LEN];
>  	int hdrlen;
> @@ -2267,8 +2337,15 @@ int force_object_loose(const struct object_id *oid, time_t mtime)
>  	oi.contentp = &buf;
>  	if (oid_object_info_extended(the_repository, oid, &oi, 0))
>  		return error(_("cannot read object for %s"), oid_to_hex(oid));
> +	if (compat) {
> +		if (repo_oid_to_algop(repo, oid, compat, &compat_oid))
> +			return error(_("cannot map object %s to %s"),
> +				     oid_to_hex(oid), compat->name);
> +	}
>  	hdrlen = format_object_header(hdr, sizeof(hdr), type, len);
>  	ret = write_loose_object(oid, hdr, hdrlen, buf, len, mtime, 0);
> +	if (!ret && compat)
> +		ret = repo_add_loose_object_map(the_repository, oid, &compat_oid);
>  	free(buf);
>  
>  	return ret;
> -- 
> 2.41.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 00/30] initial support for multiple hash functions
  2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
                     ` (30 preceding siblings ...)
  2024-02-07 22:18   ` [PATCH v2 00/30] initial support for multiple hash functions Junio C Hamano
@ 2024-02-15 11:27   ` Patrick Steinhardt
  31 siblings, 0 replies; 104+ messages in thread
From: Patrick Steinhardt @ 2024-02-15 11:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Junio C Hamano, git, brian m. carlson, Eric Sunshine,
	Eric W. Biederman

[-- Attachment #1: Type: text/plain, Size: 1377 bytes --]

On Sun, Oct 01, 2023 at 09:39:09PM -0500, Eric W. Biederman wrote:
> 
> This addresses all of the known test failures from v1 of this set of
> changes.  In particular I have reworked commit_tree_extended which
> was flagged by smatch, -Werror=array-bounds, and the leak detector.
> 
> One functional bug was fixed in repo_for_each_abbrev where it was
> mistakenly displaying too many ambiguous oids.
> 
> I am posting this so that people review and testing of this patchset
> won't be distracted by the known and fixed issues.

Thanks! I've reviewed this patch series up to patch 7.

I think the most important question mark I currently have is scalability
of the proposed object mapping format. The complexity to load the object
mappings is currently O(nlogn) and requires reading a file that is
easily hundreds of megabytes or even gigabytes in size.

I have a feeling that this needs to be addressed before such an object
mapping would be feasible for production use, or otherwise it would
incur too high a cost to be useful. I'm afraid that this will make the
whole series more complex to implement -- I'm sorry about that.

I've added comments and ideas regarding this issue on patch 6 and 7. It
could totally be that I'm missing the obvious though, or that my ideas
suck. Please don't hesitate to point that out if that is the case.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 02/30] oid-array: teach oid-array to handle multiple kinds of oids
  2024-02-15  6:22       ` Eric W. Biederman
@ 2024-02-16  0:16         ` Linus Arver
  2024-02-16  4:48           ` Eric W. Biederman
  0 siblings, 1 reply; 104+ messages in thread
From: Linus Arver @ 2024-02-16  0:16 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Junio C Hamano, git, brian m. carlson, Eric Sunshine,
	Eric W. Biederman

"Eric W. Biederman" <ebiederm@gmail.com> writes:

> Linus Arver <linusa@google.com> writes:
>
>> "Eric W. Biederman" <ebiederm@gmail.com> writes:
>>
>>> From: "Eric W. Biederman" <ebiederm@xmission.com>
>>>
>>> While looking at how to handle input of both SHA-1 and SHA-256 oids in
>>> get_oid_with_context, I realized that the oid_array in
>>> repo_for_each_abbrev might have more than one kind of oid stored in it
>>> simultaneously.
>>>
>>> Update to oid_array_append to ensure that oids added to an oid array
>>
>> s/Update to/Update
>>
>>> always have an algorithm set.
>>>
>>> Update void_hashcmp to first verify two oids use the same hash algorithm
>>> before comparing them to each other.
>>>
>>> With that oid-array should be safe to use with different kinds of
>>
>> s/oid-array/oid_array
>>
>>> oids simultaneously.
>>>
>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>> ---
>>>  oid-array.c | 12 ++++++++++--
>>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/oid-array.c b/oid-array.c
>>> index 8e4717746c31..1f36651754ed 100644
>>> --- a/oid-array.c
>>> +++ b/oid-array.c
>>> @@ -6,12 +6,20 @@ void oid_array_append(struct oid_array *array, const struct object_id *oid)
>>>  {
>>>  	ALLOC_GROW(array->oid, array->nr + 1, array->alloc);
>>>  	oidcpy(&array->oid[array->nr++], oid);
>>> +	if (!oid->algo)
>>> +		oid_set_algo(&array->oid[array->nr - 1], the_hash_algo);
>>
>> How come we can't set oid->algo _before_ we call oidcpy()? It seems odd
>> that we do the copy first and then modify what we just copied after the
>> fact, instead of making sure that the thing we want to copy is correct
>> before doing the copy.
>>
>> But also, if we are going to make the oid object "correct" before
>> invoking oidcpy(), we might as well do it when the oid is first
>> created/used (in the caller(s) of this function). I don't demand that
>> you find/demonstrate where all these places are in this series (maybe
>> that's a hairy problem to tackle?), but it seems cleaner in principle to
>> fix the creation of oid objects instead of having to make oid users
>> clean up their act like this after using them.
>
> There is a hairy problem here.
>
> I believe for reasons of simplicity when the algo field was added to
> struct object_id it was allowed to be zero for users that don't
> particularly care about the hash algorithm, and are happy to use the git
> default hash algorithm.
>
> Me experience working on this set of change set showed that there
> are oids without their algo set in all kinds of places in the tree.

Ah, I see. Thanks for the clarification.

> I could not think of any sure way to go through the entire tree
> and find those users, so I just made certain that oid array handled
> that case.
>
> I need algo to be set properly in the oids in the oid array so I
> could extend oid_array to hold multiple kinds of oids at the same
> time.  To allow multiple kinds of oids at the same time void_hashcmp
> needs a simple and reliable way to tell what the algorithm is of
> any given oid.

Makes sense.

>>
>>>  	array->sorted = 0;
>>>  }
>>>  
>>> -static int void_hashcmp(const void *a, const void *b)
>>> +static int void_hashcmp(const void *va, const void *vb)
>>>  {
>>> -	return oidcmp(a, b);
>>> +	const struct object_id *a = va, *b = vb;
>>> +	int ret;
>>> +	if (a->algo == b->algo)
>>> +		ret = oidcmp(a, b);
>>
>> This makes sense (per the commit message description) ...
>>
>>> +	else
>>> +		ret = a->algo > b->algo ? 1 : -1;
>>
>> ... but this seems to go against it? I thought you wanted to only ever
>> compare hashes if they were of the same algo? It would be good to add a
>> comment explaining why this is OK (we are no longer doing a byte-by-byte
>> comparison of these oids any more here like we do for oidcmp() above
>> which boils down to calling memcmp()).
>
> So the goal of this change is for oid_array to be able to hold hashes
> from multiple algorithms at the same time.
>
> A key part of oid_array is oid_array_sort that allows functions such
> as oid_array_lookup and oid_array_for_each_unique.
>
> To that end there needs to be a total ordering of oids.
>
> The function oidcmp is only defined when two oids are of the same
> algorithm, it does not even test to detect the case of comparing
> mismatched algorithms.
>
> Therefore to get a total ordering of oids.  I must use oidcmp
> when the algorithm is the same (the common case) or simply order
> the oids by algorithm when the algorithms are different.
>
>
>
> All of this is relevant to get_oid_with_context as get_oid_with_context
> and it's helper functions contain the logic that determines what
> we do when a hex string that is ambiguous is specified.
>
> In the ambiguous case all of the possible candidates are placed in
> an oid_array, sorted and then displayed.
>
>
> With a repository that can knows both the sha1 and the sha256 oid
> of it's objects it is possible for a short oid to match both
> some sha1 oids and some sha256 oids.

Thanks for the additional clarification. I think a lot of this could
have been added as comments or perhaps in the commit message. The "short
id can match both sha1 or sha256" is a very real scenario we need to
consider in the sha1+sha256 world, indeed.

>>> +	return ret;
>>
>> Also, in terms of style I think the "early return for errors" style
>> would be simpler to read. I.e.
>>
>>     if (a->algo > b->algo)
>>         return 1;
>>
>>     if (a->algo < b->algo)
>>         return -1;
>>
>>     return oidcmd(a, b);
>>
>
> I can see doing:
> 	if (a->algo == b->algo)
>         	return oidcmp(a,b);
>
> 	if (a->algo > b->algo)
>         	return 1;
>         else
>         	return -1;
>
> Or even:
> 	if (a->algo == b->algo)
>         	return oidcmp(a,b);
>
> 	return a->algo - b->algo;
>
> Although I suspect using subtraction is a bit too clever.

Agreed.

> Comparing for less than, and greater than, and then assuming
> the values are equal hides what is important before calling
> oidcmp which is that the algo values are equal.

I would still prefer the "early return for errors" style even in this
case. This is because I much prefer to have the question "how can things
go wrong?" answered first, and dealt with, such that as I read
top-to-bottom I am left with less things I have to consider to
understand the "happy path". WRT emphasizing the "algos equal each
other" concern, a simple comment like

     /* Only compare equal algorithms. */
     return oidcmp(a, b);

seems sufficient.

But, of course it is possible (perhaps even likely) that my preferred
style is in the minority. Up to you. Thanks.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 02/30] oid-array: teach oid-array to handle multiple kinds of oids
  2024-02-16  0:16         ` Linus Arver
@ 2024-02-16  4:48           ` Eric W. Biederman
  2024-02-17  1:59             ` Linus Arver
  0 siblings, 1 reply; 104+ messages in thread
From: Eric W. Biederman @ 2024-02-16  4:48 UTC (permalink / raw)
  To: Linus Arver
  Cc: Junio C Hamano, git, brian m. carlson, Eric Sunshine,
	Eric W. Biederman

Linus Arver <linusa@google.com> writes:

>
> I would still prefer the "early return for errors" style even in this
> case. This is because I much prefer to have the question "how can things
> go wrong?" answered first, and dealt with, such that as I read
> top-to-bottom I am left with less things I have to consider to
> understand the "happy path". WRT emphasizing the "algos equal each
> other" concern, a simple comment like

void_hashcmp is a function that reports how two elements are ordered.
There is no error handling.

There are in fact two cases that need to be handled with oid_array being
able to contain more then one kind of hash at a time.

The two entries are ordered by oidcmp.
The two entries are ordered by hash algorithm.

The order that is maintained is first everything is ordered by hash
algorithm, then for entries of identical hash algorithm they are ordered
by oidcmp.

So I don't think the concept of early return for errors can ever apply
to any version of void_hashcmp.

Eric

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2 02/30] oid-array: teach oid-array to handle multiple kinds of oids
  2024-02-16  4:48           ` Eric W. Biederman
@ 2024-02-17  1:59             ` Linus Arver
  0 siblings, 0 replies; 104+ messages in thread
From: Linus Arver @ 2024-02-17  1:59 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Junio C Hamano, git, brian m. carlson, Eric Sunshine,
	Eric W. Biederman

"Eric W. Biederman" <ebiederm@gmail.com> writes:

> Linus Arver <linusa@google.com> writes:
>
>>
>> I would still prefer the "early return for errors" style even in this
>> case. This is because I much prefer to have the question "how can things
>> go wrong?" answered first, and dealt with, such that as I read
>> top-to-bottom I am left with less things I have to consider to
>> understand the "happy path". WRT emphasizing the "algos equal each
>> other" concern, a simple comment like
>
> void_hashcmp is a function that reports how two elements are ordered.
> There is no error handling.
>
> There are in fact two cases that need to be handled with oid_array being
> able to contain more then one kind of hash at a time.
>
> The two entries are ordered by oidcmp.
> The two entries are ordered by hash algorithm.
>
> The order that is maintained is first everything is ordered by hash
> algorithm, then for entries of identical hash algorithm they are ordered
> by oidcmp.
>
> So I don't think the concept of early return for errors can ever apply
> to any version of void_hashcmp.

Ah, yes. It appears that I simply read the code wrong. Thank you for the
clarification, and I agree with your analysis and preferred style.

Sorry for the noise.

^ permalink raw reply	[flat|nested] 104+ messages in thread

end of thread, other threads:[~2024-02-17  1:59 UTC | newest]

Thread overview: 104+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-27 19:49 [PATCH 00/30] Initial support for multiple hash functions Eric W. Biederman
2023-09-27 19:55 ` [PATCH 01/30] object-file-convert: Stubs for converting from one object format to another Eric W. Biederman
2023-09-27 20:42   ` Eric Sunshine
2023-10-02  1:22     ` Eric W. Biederman
2023-10-02  2:27       ` Eric Sunshine
2023-09-27 19:55 ` [PATCH 02/30] oid-array: Teach oid-array to handle multiple kinds of oids Eric W. Biederman
2023-09-27 23:20   ` Eric Sunshine
2023-09-27 19:55 ` [PATCH 03/30] object-names: Support input of oids in any supported hash Eric W. Biederman
2023-09-27 23:29   ` Eric Sunshine
2023-10-02  1:54     ` Eric W. Biederman
2023-09-27 19:55 ` [PATCH 04/30] repository: add a compatibility hash algorithm Eric W. Biederman
2023-09-27 19:55 ` [PATCH 05/30] loose: add a mapping between SHA-1 and SHA-256 for loose objects Eric W. Biederman
2023-09-28  7:14   ` Eric Sunshine
2023-10-02  2:11     ` Eric W. Biederman
2023-10-02  2:36       ` Eric Sunshine
2023-09-27 19:55 ` [PATCH 06/30] loose: Compatibilty short name support Eric W. Biederman
2023-09-27 19:55 ` [PATCH 07/30] object-file: Update the loose object map when writing loose objects Eric W. Biederman
2023-09-27 19:55 ` [PATCH 08/30] object-file: Add a compat_oid_in parameter to write_object_file_flags Eric W. Biederman
2023-09-27 19:55 ` [PATCH 09/30] commit: write commits for both hashes Eric W. Biederman
2023-09-27 19:55 ` [PATCH 10/30] commit: Convert mergetag before computing the signature of a commit Eric W. Biederman
2023-09-27 19:55 ` [PATCH 11/30] commit: Export add_header_signature to support handling signatures on tags Eric W. Biederman
2023-09-27 19:55 ` [PATCH 12/30] tag: sign both hashes Eric W. Biederman
2023-09-27 19:55 ` [PATCH 13/30] cache: add a function to read an OID of a specific algorithm Eric W. Biederman
2023-09-27 19:55 ` [PATCH 14/30] object: Factor out parse_mode out of fast-import and tree-walk into in object.h Eric W. Biederman
2023-09-27 19:55 ` [PATCH 15/30] object-file-convert: add a function to convert trees between algorithms Eric W. Biederman
2023-09-27 19:55 ` [PATCH 16/30] object-file-convert: convert tag objects when writing Eric W. Biederman
2023-09-27 19:55 ` [PATCH 17/30] object-file-convert: Don't leak when converting tag objects Eric W. Biederman
2023-09-27 19:55 ` [PATCH 18/30] object-file-convert: convert commit objects when writing Eric W. Biederman
2023-09-27 19:55 ` [PATCH 19/30] object-file-convert: Convert commits that embed signed tags Eric W. Biederman
2023-09-27 19:55 ` [PATCH 20/30] object-file: Update object_info_extended to reencode objects Eric W. Biederman
2023-09-27 19:55 ` [PATCH 21/30] repository: Implement extensions.compatObjectFormat Eric W. Biederman
2023-09-27 21:39   ` Junio C Hamano
2023-09-28 20:18     ` Junio C Hamano
2023-09-29  0:50       ` Eric Biederman
2023-09-29 16:59       ` Eric W. Biederman
2023-09-29 18:48         ` Junio C Hamano
2023-10-02  0:48           ` Eric W. Biederman
2023-10-02  1:31     ` Eric W. Biederman
2023-09-27 19:55 ` [PATCH 22/30] rev-parse: Add an --output-object-format parameter Eric W. Biederman
2023-09-27 19:55 ` [PATCH 23/30] builtin/cat-file: Let the oid determine the output algorithm Eric W. Biederman
2023-09-27 19:55 ` [PATCH 24/30] tree-walk: init_tree_desc take an oid to get the hash algorithm Eric W. Biederman
2023-09-27 19:55 ` [PATCH 25/30] object-file: Handle compat objects in check_object_signature Eric W. Biederman
2023-09-27 19:55 ` [PATCH 26/30] builtin/ls-tree: Let the oid determine the output algorithm Eric W. Biederman
2023-09-27 19:55 ` [PATCH 27/30] test-lib: Compute the compatibility hash so tests may use it Eric W. Biederman
2023-09-27 19:55 ` [PATCH 28/30] t1006: Rename sha1 to oid Eric W. Biederman
2023-09-27 19:55 ` [PATCH 29/30] t1006: Test oid compatibility with cat-file Eric W. Biederman
2023-09-27 19:55 ` [PATCH 30/30] t1016-compatObjectFormat: Add tests to verify the conversion between objects Eric W. Biederman
2023-09-27 21:31 ` [PATCH 00/30] Initial support for multiple hash functions Junio C Hamano
2023-10-02  2:39 ` [PATCH v2 00/30] initial " Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 01/30] object-file-convert: stubs for converting from one object format to another Eric W. Biederman
2024-02-08  8:23     ` Linus Arver
2024-02-15 11:21     ` Patrick Steinhardt
2023-10-02  2:40   ` [PATCH v2 02/30] oid-array: teach oid-array to handle multiple kinds of oids Eric W. Biederman
2024-02-13  8:16     ` Linus Arver
2024-02-15  6:22       ` Eric W. Biederman
2024-02-16  0:16         ` Linus Arver
2024-02-16  4:48           ` Eric W. Biederman
2024-02-17  1:59             ` Linus Arver
2024-02-13  8:31     ` Kristoffer Haugsbakk
2024-02-15  6:24       ` Eric W. Biederman
2024-02-15 11:21     ` Patrick Steinhardt
2023-10-02  2:40   ` [PATCH v2 03/30] object-names: support input of oids in any supported hash Eric W. Biederman
2024-02-13  9:33     ` Linus Arver
2024-02-15 11:21     ` Patrick Steinhardt
2023-10-02  2:40   ` [PATCH v2 04/30] repository: add a compatibility hash algorithm Eric W. Biederman
2024-02-13 10:02     ` Linus Arver
2024-02-15 11:22     ` Patrick Steinhardt
2023-10-02  2:40   ` [PATCH v2 05/30] loose: add a mapping between SHA-1 and SHA-256 for loose objects Eric W. Biederman
2024-02-14  7:20     ` Linus Arver
2024-02-15  5:33       ` Eric W. Biederman
2024-02-15 11:22     ` Patrick Steinhardt
2023-10-02  2:40   ` [PATCH v2 06/30] loose: compatibilty short name support Eric W. Biederman
2024-02-15 11:22     ` Patrick Steinhardt
2023-10-02  2:40   ` [PATCH v2 07/30] object-file: update the loose object map when writing loose objects Eric W. Biederman
2024-02-15 11:22     ` Patrick Steinhardt
2023-10-02  2:40   ` [PATCH v2 08/30] object-file: add a compat_oid_in parameter to write_object_file_flags Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 09/30] commit: write commits for both hashes Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 10/30] commit: convert mergetag before computing the signature of a commit Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 11/30] commit: export add_header_signature to support handling signatures on tags Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 12/30] tag: sign both hashes Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 13/30] cache: add a function to read an OID of a specific algorithm Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 14/30] object: factor out parse_mode out of fast-import and tree-walk into in object.h Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 15/30] object-file-convert: add a function to convert trees between algorithms Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 16/30] object-file-convert: convert tag objects when writing Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 17/30] object-file-convert: don't leak when converting tag objects Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 18/30] object-file-convert: convert commit objects when writing Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 19/30] object-file-convert: convert commits that embed signed tags Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 20/30] object-file: update object_info_extended to reencode objects Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 21/30] repository: implement extensions.compatObjectFormat Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 22/30] rev-parse: add an --output-object-format parameter Eric W. Biederman
2024-02-08 16:25     ` Jean-Noël Avila
2023-10-02  2:40   ` [PATCH v2 23/30] builtin/cat-file: let the oid determine the output algorithm Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 24/30] tree-walk: init_tree_desc take an oid to get the hash algorithm Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 25/30] object-file: handle compat objects in check_object_signature Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 26/30] builtin/ls-tree: let the oid determine the output algorithm Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 27/30] test-lib: compute the compatibility hash so tests may use it Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 28/30] t1006: rename sha1 to oid Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 29/30] t1006: test oid compatibility with cat-file Eric W. Biederman
2023-10-02  2:40   ` [PATCH v2 30/30] t1016-compatObjectFormat: add tests to verify the conversion between objects Eric W. Biederman
2024-02-07 22:18   ` [PATCH v2 00/30] initial support for multiple hash functions Junio C Hamano
2024-02-08  0:24     ` Linus Arver
2024-02-08  6:11       ` Patrick Steinhardt
2024-02-14  7:36       ` Linus Arver
2024-02-15 11:27   ` Patrick Steinhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).