Git development
 help / color / mirror / Atom feed
* Re: [PATCH v2 0/6] Support hashing objects larger than 4GB on Windows
From: Junio C Hamano @ 2026-06-16 16:09 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Philip Oakley, Patrick Steinhardt, Johannes Schindelin
In-Reply-To: <pull.2138.v2.git.1781621398.gitgitgadget@gmail.com>

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> Changes since v1:
>
>  * Rebased to current master to resolve the conflicts with
>    ps/odb-source-loose

Very much appreciated.

>  * Dropped the !LONG_IS_64BIT prereq from the added/touched tests, as it is
>    now no longer needed

Good thing to do and see that the code works as well as it could,
whether a long is 32-bit or 64-bit ;-).

> Philip Oakley (6):
>   hash-object: demonstrate a >4GB/LLP64 problem
>   object-file.c: use size_t for header lengths
>   hash algorithms: use size_t for section lengths
>   hash-object --stdin: verify that it works with >4GB/LLP64
>   hash-object: add another >4GB/LLP64 test case
>   hash-object: add a >4GB/LLP64 test case using filtered input
>
>  object-file.c          | 14 +++++++-------
>  object-file.h          |  6 +++---
>  odb/source-files.c     |  2 +-
>  odb/source-inmemory.c  |  2 +-
>  odb/source-loose.c     |  4 ++--
>  odb/source.h           |  2 +-
>  sha1dc_git.c           |  3 +--
>  sha1dc_git.h           |  2 +-
>  t/t1007-hash-object.sh | 39 +++++++++++++++++++++++++++++++++++++++
>  9 files changed, 56 insertions(+), 18 deletions(-)

Will queue.  Thanks.

>
>
> base-commit: 700432b2ba22603a0bcb71475c9c333d17c9b0d1
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2138%2Fdscho%2FPhilipOakley%2Fhashliteral_t-v2
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2138/dscho/PhilipOakley/hashliteral_t-v2
> Pull-Request: https://github.com/gitgitgadget/git/pull/2138
>
> Range-diff vs v1:
>
>  1:  84e1cd0aa0 = 1:  9c01bac407 hash-object: demonstrate a >4GB/LLP64 problem
>  2:  809d83e46f ! 2:  aa5859c14f object-file.c: use size_t for header lengths
>      @@ Commit message

By the way, how is range-diff driven via GGG?  After applying these
patches on the same base commit, my "git range-diff v1...v2" invocation
punts on matching step 2 and I do not get a comparison like this
unless I give --creation-factor=<large number> from the command line.



^ permalink raw reply

* Re: [PATCH v3] read_gitfile(): simplify NOT_A_REPO error message
From: Junio C Hamano @ 2026-06-16 15:48 UTC (permalink / raw)
  To: Jeff King; +Cc: Patrick Steinhardt, git, Tian Yuchen
In-Reply-To: <20260616144554.GA2305974@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Tue, Jun 16, 2026 at 07:25:02AM -0700, Junio C Hamano wrote:
>
>> >     +@@ t/t7450-bad-git-dotfiles.sh: test_expect_success 'git dirs of sibling submodules must not be nested' '
>> >     + test_expect_success 'submodule git dir nesting detection must work with parallel cloning' '
>> >     + 	test_must_fail git clone --recurse-submodules --jobs=2 nested clone_parallel 2>err &&
>> >     + 	cat err &&
>> >     +-	grep -E "(already exists|is inside git dir|not a git repository)" err &&
>> >     ++	grep -E "(already exists|is inside git dir|does not point to a valid repository)" err &&
>> 
>> A few things.
>> 
>>  * Will we be happy to see only one of these possibilities, or do we
>>    expect to see these once for each kind?
>
> I imagine it is only one. This all comes from 9cf8547320 (clone: prevent
> clashing git dirs when cloning submodule in parallel, 2024-01-28), and
> it is expecting the nested path to cause a failure. Which failure I
> guess depends on the racy ordering. If we create the inner one first,
> then we probably get "already exists", and if the outer one, then "is
> inside git dir". I don't know exactly what sequence yields the
> NOT_A_REPO message.
>
> But none of that is changing in this patch, just what the user-visible
> text is for the NOT_A_REPO case.
>
> I did briefly wonder if we might see "not a git repository" from a
> _different_ code path, and need to catch it along with the new message.
> But running successfully with --stress implies that we never see the old
> one anymore.

I see.  Thanks.

>
>>  * a recently started in-flight topic tries to catch bare "grep" and
>>    fails until you write test_grep X-<.
>
> Yeah. This will create a merge conflict for you, but hopefully the
> resolution should be obvious. I don't think it makes sense to fix here,
> as it's orthogonal to the purpose of the patch.

Yup.  I agree that, given that others in the same script will be
updated by that other topic to conflict with this change, it would
not make sense to do the same changes here.

Thanks.

^ permalink raw reply

* Re: [PATCH 4/4] builtin/refs: add "rename" subcommand
From: Junio C Hamano @ 2026-06-16 14:53 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git
In-Reply-To: <20260616-pks-refs-writing-subcommands-v1-4-9f5219b6109d@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> +static int cmd_refs_rename(int argc, const char **argv, const char *prefix,
> +			   struct repository *repo)
> +{
> +	static char const * const refs_rename_usage[] = {
> +		REFS_RENAME_USAGE,
> +		NULL
> +	};
> +	const char *message = NULL;
> +	struct option opts[] = {
> +		OPT_STRING(0, "message", &message, N_("reason"),
> +			   N_("reason of the update")),
> +		OPT_END(),
> +	};
> +	const char *oldref, *newref;
> +
> +	argc = parse_options(argc, argv, prefix, opts, refs_rename_usage, 0);
> +	if (argc != 2)
> +		usage(_("rename requires old and new reference name"));
> +	if (message && !*message)
> +		die(_("refusing to perform update with empty message"));
> +
> +	oldref = argv[0];
> +	newref = argv[1];
> +
> +	if (check_refname_format(oldref, 0))
> +		die(_("invalid ref format: %s"), oldref);
> +	if (check_refname_format(newref, 0))
> +		die(_("invalid ref format: %s"), newref);

Do we want to quote the value?  What other subcommands do in "git refs"?

> +	if (!refs_ref_exists(get_main_ref_store(repo), oldref))
> +		die(_("reference does not exist: '%s'"), oldref);
> +	if (refs_ref_exists(get_main_ref_store(repo), newref))
> +		die(_("reference already exists: '%s'"), newref);
> +
> +	return refs_rename_ref(get_main_ref_store(repo), oldref, newref, message);
> +}

I suspect that my version shared the same issue, but doesn't
refs_rename_ref() return -1 for failure, which we may want to turn
to positive 1 before returning?

This is a tangent but git.c:handle_builtin() that calls
git.c:run_builtin() may want to do the "negative return? flip the
polarity" conversion to make this worry go away.  I dunno what such
a change would break, though.

If we rename a ref that does not have a reflog, would it leave the
ref under the new name without reflog, or would we get a reflog with
a single entry that marks the fact the old ref was renamed into the
new ref?  Should that be controlled via --create-reflog option?

^ permalink raw reply

* Re: [PATCH 3/4] builtin/refs: add "update" subcommand
From: Junio C Hamano @ 2026-06-16 14:52 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git
In-Reply-To: <20260616-pks-refs-writing-subcommands-v1-3-9f5219b6109d@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

>  git refs delete [--message=<reason>] [--no-deref] <ref> [<oldvalue>]
> +git refs update [--message=<reason>] [--no-deref] [--create-reflog] <ref> <new-value> [<old-value>]

"<old-value> vs <new-value>" is good, we should update "delete" to
use "<old-value>" to match.


>  DESCRIPTION
>  -----------
> @@ -58,6 +59,12 @@ delete::
>  	reference is only deleted after verifying that it currently contains
>  	`<oldvalue>`.
>  
> +update::
> +	Update the given reference to point at `<new-value>`. This subcommand
> +	mirrors `git update-ref` (see linkgit:git-update-ref[1]). When
> +	`<old-value>` is given, the reference is only updated after verifying
> +	that it currently contains `<old-value>`.

As to the lack of "create", among the two potential changes, I have
a slight preference for adding "create" and failing "update" that
does not refer to an existing ref.  If we go that route, the
"--create-reflog" option should move to "create", as "update" will
never be used to create a new ref.


> +	if (repo_get_oid_with_flags(repo, argv[1], &newoid,
> +				    GET_OID_SKIP_AMBIGUITY_CHECK))
> +		die(_("invalid new object ID: %s"), argv[1]);
> +	if (argc == 3 &&
> +	    repo_get_oid_with_flags(repo, argv[2], &oldoid,
> +				    GET_OID_SKIP_AMBIGUITY_CHECK))
> +		die(_("invalid old object ID: %s"), argv[2]);

On the "delete" side, these messages quote the object name, i.e.,

			die(_("invalid old object ID: '%s'"), argv[1]);

We should be consistent.

^ permalink raw reply

* [PATCH v2 6/6] hash-object: add a >4GB/LLP64 test case using filtered input
From: Philip Oakley via GitGitGadget @ 2026-06-16 14:49 UTC (permalink / raw)
  To: git; +Cc: Philip Oakley, Patrick Steinhardt, Johannes Schindelin,
	Philip Oakley
In-Reply-To: <pull.2138.v2.git.1781621398.gitgitgadget@gmail.com>

From: Philip Oakley <philipoakley@iee.email>

To verify that the `clean` side of the `clean`/`smudge` filter code is
correct with regards to LLP64 (read: to ensure that `size_t` is used
instead of `unsigned long`), here is a test case using a trivial filter,
specifically _not_ writing anything to the object store to limit the
scope of the test case.

As in previous commits, the `big` file from previous test cases is
reused if available, to save setup time, otherwise re-generated.

Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t1007-hash-object.sh | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index f96c29ce68..4bc82dd968 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -285,4 +285,16 @@ test_expect_success EXPENSIVE,SIZE_T_IS_64BIT \
 	test_cmp expect actual
 '
 
+# This clean filter does nothing, other than excercising the interface.
+# We ensure that cleaning doesn't mangle large files on 64-bit Windows.
+test_expect_success EXPENSIVE,SIZE_T_IS_64BIT \
+		'hash filtered files over 4GB correctly' '
+	{ test -f big || test-tool genzeros $((5*1024*1024*1024)) >big; } &&
+	test_oid large5GB >expect &&
+	test_config filter.null-filter.clean "cat" &&
+	echo "big filter=null-filter" >.gitattributes &&
+	git hash-object -- big >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related

* [PATCH v2 5/6] hash-object: add another >4GB/LLP64 test case
From: Philip Oakley via GitGitGadget @ 2026-06-16 14:49 UTC (permalink / raw)
  To: git; +Cc: Philip Oakley, Patrick Steinhardt, Johannes Schindelin,
	Philip Oakley
In-Reply-To: <pull.2138.v2.git.1781621398.gitgitgadget@gmail.com>

From: Philip Oakley <philipoakley@iee.email>

To complement the `--stdin` and `--literally` test cases that verify
that we can hash files larger than 4GB on 64-bit platforms using the
LLP64 data model, here is a test case that exercises `hash-object`
_without_ any options.

Just as before, we use the `big` file from the previous test case if it
exists to save on setup time, otherwise generate it.

Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t1007-hash-object.sh | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index bcae3fc54c..f96c29ce68 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -277,4 +277,12 @@ test_expect_success EXPENSIVE,SIZE_T_IS_64BIT \
 	test_cmp expect actual
 '
 
+test_expect_success EXPENSIVE,SIZE_T_IS_64BIT \
+		'files over 4GB hash correctly' '
+	{ test -f big || test-tool genzeros $((5*1024*1024*1024)) >big; } &&
+	test_oid large5GB >expect &&
+	git hash-object -- big >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v2 4/6] hash-object --stdin: verify that it works with >4GB/LLP64
From: Philip Oakley via GitGitGadget @ 2026-06-16 14:49 UTC (permalink / raw)
  To: git; +Cc: Philip Oakley, Patrick Steinhardt, Johannes Schindelin,
	Philip Oakley
In-Reply-To: <pull.2138.v2.git.1781621398.gitgitgadget@gmail.com>

From: Philip Oakley <philipoakley@iee.email>

Just like the `hash-object --literally` code path, the `--stdin` code
path also needs to use `size_t` instead of `unsigned long` to represent
memory sizes, otherwise it would cause problems on platforms using the
LLP64 data model (such as Windows).

To limit the scope of the test case, the object is explicitly not
written to the object store, nor are any filters applied.

The `big` file from the previous test case is reused to save setup time;
To avoid relying on that side effect, it is generated if it does not
exist (e.g. when running via `sh t1007-*.sh --long --run=1,41`).

Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t1007-hash-object.sh | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index f028a1cbcc..bcae3fc54c 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -269,4 +269,12 @@ test_expect_success EXPENSIVE,SIZE_T_IS_64BIT \
 	test_cmp expect actual
 '
 
+test_expect_success EXPENSIVE,SIZE_T_IS_64BIT \
+		'files over 4GB hash correctly via --stdin' '
+	{ test -f big || test-tool genzeros $((5*1024*1024*1024)) >big; } &&
+	test_oid large5GB >expect &&
+	git hash-object --stdin <big >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v2 3/6] hash algorithms: use size_t for section lengths
From: Philip Oakley via GitGitGadget @ 2026-06-16 14:49 UTC (permalink / raw)
  To: git; +Cc: Philip Oakley, Patrick Steinhardt, Johannes Schindelin,
	Philip Oakley
In-Reply-To: <pull.2138.v2.git.1781621398.gitgitgadget@gmail.com>

From: Philip Oakley <philipoakley@iee.email>

Continue walking the code path for the >4GB `hash-object --literally`
test to the hash algorithm step for LLP64 systems.

This patch lets the SHA1DC code use `size_t`, making it compatible with
LLP64 data models (as used e.g. by Windows).

The interested reader of this patch will note that we adjust the
signature of the `git_SHA1DCUpdate()` function without updating _any_
call site. This certainly puzzled at least one reviewer already, so here
is an explanation:

This function is never called directly, but always via the macro
`platform_SHA1_Update`, which is usually called via the macro
`git_SHA1_Update`. However, we never call `git_SHA1_Update()` directly
in `struct git_hash_algo`. Instead, we call `git_hash_sha1_update()`,
which is defined thusly:

    static void git_hash_sha1_update(git_hash_ctx *ctx,
                                     const void *data, size_t len)
    {
        git_SHA1_Update(&ctx->sha1, data, len);
    }

i.e. it contains an implicit downcast from `size_t` to `unsigned long`
(before this here patch). With this patch, there is no downcast anymore.

With this patch, finally, the t1007-hash-object.sh "files over 4GB hash
literally" test case is fixed.

Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 object-file.c          | 4 ++--
 sha1dc_git.c           | 3 +--
 sha1dc_git.h           | 2 +-
 t/t1007-hash-object.sh | 2 +-
 4 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/object-file.c b/object-file.c
index dccbe0fc3e..0056c369ce 100644
--- a/object-file.c
+++ b/object-file.c
@@ -316,7 +316,7 @@ int parse_loose_header(const char *hdr, struct object_info *oi)
 }
 
 static void hash_object_body(const struct git_hash_algo *algo, struct git_hash_ctx *c,
-			     const void *buf, unsigned long len,
+			     const void *buf, size_t len,
 			     struct object_id *oid,
 			     char *hdr, size_t *hdrlen)
 {
@@ -336,7 +336,7 @@ void write_object_file_prepare(const struct git_hash_algo *algo,
 	/* Generate the header */
 	*hdrlen = format_object_header(hdr, *hdrlen, type, len);
 
-	/* Sha1.. */
+	/* Hash (function pointers) computation */
 	hash_object_body(algo, &c, buf, len, oid, hdr, hdrlen);
 }
 
diff --git a/sha1dc_git.c b/sha1dc_git.c
index 9b675a046e..fe58d7962a 100644
--- a/sha1dc_git.c
+++ b/sha1dc_git.c
@@ -27,10 +27,9 @@ void git_SHA1DCFinal(unsigned char hash[20], SHA1_CTX *ctx)
 /*
  * Same as SHA1DCUpdate, but adjust types to match git's usual interface.
  */
-void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *vdata, unsigned long len)
+void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *vdata, size_t len)
 {
 	const char *data = vdata;
-	/* We expect an unsigned long, but sha1dc only takes an int */
 	while (len > INT_MAX) {
 		SHA1DCUpdate(ctx, data, INT_MAX);
 		data += INT_MAX;
diff --git a/sha1dc_git.h b/sha1dc_git.h
index f6f880cabe..0bcf1aa84b 100644
--- a/sha1dc_git.h
+++ b/sha1dc_git.h
@@ -15,7 +15,7 @@ void git_SHA1DCInit(SHA1_CTX *);
 #endif
 
 void git_SHA1DCFinal(unsigned char [20], SHA1_CTX *);
-void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *data, unsigned long len);
+void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *data, size_t len);
 
 #define platform_SHA_IS_SHA1DC /* used by "test-tool sha1-is-sha1dc" */
 
diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index 7867fd1dbf..f028a1cbcc 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -261,7 +261,7 @@ test_expect_success '--stdin outside of repository (uses default hash)' '
 	test_cmp expect actual
 '
 
-test_expect_failure EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
+test_expect_success EXPENSIVE,SIZE_T_IS_64BIT \
 		'files over 4GB hash literally' '
 	test-tool genzeros $((5*1024*1024*1024)) >big &&
 	test_oid large5GB >expect &&
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v2 2/6] object-file.c: use size_t for header lengths
From: Philip Oakley via GitGitGadget @ 2026-06-16 14:49 UTC (permalink / raw)
  To: git; +Cc: Philip Oakley, Patrick Steinhardt, Johannes Schindelin,
	Philip Oakley
In-Reply-To: <pull.2138.v2.git.1781621398.gitgitgadget@gmail.com>

From: Philip Oakley <philipoakley@iee.email>

Continue walking the code path for the >4GB `hash-object --literally`
test. The `hash_object_file_literally()` function internally uses both
`hash_object_file()` and `write_object_file_prepare()`. Both function
signatures use `unsigned long` rather than `size_t` for the mem buffer
sizes. Use `size_t` instead, for LLP64 compatibility.

While at it, convert those function's object's header buffer length to
`size_t` for consistency. The value is already upcast to `uintmax_t` for
print format compatibility.

Note: The hash-object test still does not pass. A subsequent commit
continues to walk the call tree's lower level hash functions to identify
further fixes.

Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 object-file.c         | 10 +++++-----
 object-file.h         |  6 +++---
 odb/source-files.c    |  2 +-
 odb/source-inmemory.c |  2 +-
 odb/source-loose.c    |  4 ++--
 odb/source.h          |  2 +-
 6 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/object-file.c b/object-file.c
index 9afa842da2..dccbe0fc3e 100644
--- a/object-file.c
+++ b/object-file.c
@@ -318,7 +318,7 @@ int parse_loose_header(const char *hdr, struct object_info *oi)
 static void hash_object_body(const struct git_hash_algo *algo, struct git_hash_ctx *c,
 			     const void *buf, unsigned long len,
 			     struct object_id *oid,
-			     char *hdr, int *hdrlen)
+			     char *hdr, size_t *hdrlen)
 {
 	algo->init_fn(c);
 	git_hash_update(c, hdr, *hdrlen);
@@ -327,9 +327,9 @@ static void hash_object_body(const struct git_hash_algo *algo, struct git_hash_c
 }
 
 void write_object_file_prepare(const struct git_hash_algo *algo,
-			       const void *buf, unsigned long len,
+			       const void *buf, size_t len,
 			       enum object_type type, struct object_id *oid,
-			       char *hdr, int *hdrlen)
+			       char *hdr, size_t *hdrlen)
 {
 	struct git_hash_ctx c;
 
@@ -472,11 +472,11 @@ out:
 }
 
 void hash_object_file(const struct git_hash_algo *algo, const void *buf,
-		      unsigned long len, enum object_type type,
+		      size_t len, enum object_type type,
 		      struct object_id *oid)
 {
 	char hdr[MAX_HEADER_LEN];
-	int hdrlen = sizeof(hdr);
+	size_t hdrlen = sizeof(hdr);
 
 	write_object_file_prepare(algo, buf, len, type, oid, hdr, &hdrlen);
 }
diff --git a/object-file.h b/object-file.h
index 528c4e6e69..4c87cd160b 100644
--- a/object-file.h
+++ b/object-file.h
@@ -131,12 +131,12 @@ int finalize_object_file_flags(struct repository *repo,
 			       enum finalize_object_file_flags flags);
 
 void hash_object_file(const struct git_hash_algo *algo, const void *buf,
-		      unsigned long len, enum object_type type,
+		      size_t len, enum object_type type,
 		      struct object_id *oid);
 void write_object_file_prepare(const struct git_hash_algo *algo,
-			       const void *buf, unsigned long len,
+			       const void *buf, size_t len,
 			       enum object_type type, struct object_id *oid,
-			       char *hdr, int *hdrlen);
+			       char *hdr, size_t *hdrlen);
 int write_loose_object(struct odb_source_loose *loose,
 		       const struct object_id *oid, char *hdr,
 		       int hdrlen, const void *buf, unsigned long len,
diff --git a/odb/source-files.c b/odb/source-files.c
index 5bdd042922..3b1261eba9 100644
--- a/odb/source-files.c
+++ b/odb/source-files.c
@@ -159,7 +159,7 @@ static int odb_source_files_freshen_object(struct odb_source *source,
 }
 
 static int odb_source_files_write_object(struct odb_source *source,
-					 const void *buf, unsigned long len,
+					 const void *buf, size_t len,
 					 enum object_type type,
 					 struct object_id *oid,
 					 struct object_id *compat_oid,
diff --git a/odb/source-inmemory.c b/odb/source-inmemory.c
index e004566d76..7f1b6f4636 100644
--- a/odb/source-inmemory.c
+++ b/odb/source-inmemory.c
@@ -227,7 +227,7 @@ static int odb_source_inmemory_count_objects(struct odb_source *source,
 }
 
 static int odb_source_inmemory_write_object(struct odb_source *source,
-					    const void *buf, unsigned long len,
+					    const void *buf, size_t len,
 					    enum object_type type,
 					    struct object_id *oid,
 					    struct object_id *compat_oid UNUSED,
diff --git a/odb/source-loose.c b/odb/source-loose.c
index 7d7ea2fb84..4cfa5b23fc 100644
--- a/odb/source-loose.c
+++ b/odb/source-loose.c
@@ -591,7 +591,7 @@ static int odb_source_loose_freshen_object(struct odb_source *source,
 }
 
 static int odb_source_loose_write_object(struct odb_source *source,
-					 const void *buf, unsigned long len,
+					 const void *buf, size_t len,
 					 enum object_type type, struct object_id *oid,
 					 struct object_id *compat_oid_in,
 					 enum odb_write_object_flags flags)
@@ -601,7 +601,7 @@ static int odb_source_loose_write_object(struct odb_source *source,
 	const struct git_hash_algo *compat = source->odb->repo->compat_hash_algo;
 	struct object_id compat_oid;
 	char hdr[MAX_HEADER_LEN];
-	int hdrlen = sizeof(hdr);
+	size_t hdrlen = sizeof(hdr);
 
 	/* Generate compat_oid */
 	if (compat) {
diff --git a/odb/source.h b/odb/source.h
index 2192a101b8..1c65a05e2c 100644
--- a/odb/source.h
+++ b/odb/source.h
@@ -199,7 +199,7 @@ struct odb_source {
 	 * return 0 on success, a negative error code otherwise.
 	 */
 	int (*write_object)(struct odb_source *source,
-			    const void *buf, unsigned long len,
+			    const void *buf, size_t len,
 			    enum object_type type,
 			    struct object_id *oid,
 			    struct object_id *compat_oid,
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v2 1/6] hash-object: demonstrate a >4GB/LLP64 problem
From: Philip Oakley via GitGitGadget @ 2026-06-16 14:49 UTC (permalink / raw)
  To: git; +Cc: Philip Oakley, Patrick Steinhardt, Johannes Schindelin,
	Philip Oakley
In-Reply-To: <pull.2138.v2.git.1781621398.gitgitgadget@gmail.com>

From: Philip Oakley <philipoakley@iee.email>

On LLP64 systems, such as Windows, the size of `long`, `int`, etc. is
only 32 bits (for backward compatibility). Git's use of `unsigned long`
for file memory sizes in many places, rather than size_t, limits the
handling of large files on LLP64 systems (commonly given as `>4GB`).

Provide a minimum test for handling a >4GB file. The `hash-object`
command, with the  `--literally` and without `-w` option avoids
writing the object, either loose or packed. This avoids the code paths
hitting the `bigFileThreshold` config test code, the zlib code, and the
pack code.

Subsequent patches will walk the test's call chain, converting types to
`size_t` (which is larger in LLP64 data models) where appropriate.

Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t1007-hash-object.sh | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index de076293b6..7867fd1dbf 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -49,6 +49,9 @@ test_expect_success 'setup' '
 
 	example sha1:ddd3f836d3e3fbb7ae289aa9ae83536f76956399
 	example sha256:b44fe1fe65589848253737db859bd490453510719d7424daab03daf0767b85ae
+
+	large5GB sha1:0be2be10a4c8764f32c4bf372a98edc731a4b204
+	large5GB sha256:dc18ca621300c8d3cfa505a275641ebab00de189859e022a975056882d313e64
 	EOF
 '
 
@@ -258,4 +261,12 @@ test_expect_success '--stdin outside of repository (uses default hash)' '
 	test_cmp expect actual
 '
 
+test_expect_failure EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
+		'files over 4GB hash literally' '
+	test-tool genzeros $((5*1024*1024*1024)) >big &&
+	test_oid large5GB >expect &&
+	git hash-object --stdin --literally <big >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v2 0/6] Support hashing objects larger than 4GB on Windows
From: Johannes Schindelin via GitGitGadget @ 2026-06-16 14:49 UTC (permalink / raw)
  To: git; +Cc: Philip Oakley, Patrick Steinhardt, Johannes Schindelin
In-Reply-To: <pull.2138.git.1780593313.gitgitgadget@gmail.com>

Philip Oakley has contributed these patches ~4.5 years ago, and they have
been carried in Git for Windows ever since.

Now that there are already other patch series flying around that try to
address various aspects about >4GB objects (which aren't handled well by Git
until it stops forcing unsigned long to do size_t's job), it seems a good
time to upstream these patches, too, at long last.

Changes since v1:

 * Rebased to current master to resolve the conflicts with
   ps/odb-source-loose
 * Dropped the !LONG_IS_64BIT prereq from the added/touched tests, as it is
   now no longer needed

Philip Oakley (6):
  hash-object: demonstrate a >4GB/LLP64 problem
  object-file.c: use size_t for header lengths
  hash algorithms: use size_t for section lengths
  hash-object --stdin: verify that it works with >4GB/LLP64
  hash-object: add another >4GB/LLP64 test case
  hash-object: add a >4GB/LLP64 test case using filtered input

 object-file.c          | 14 +++++++-------
 object-file.h          |  6 +++---
 odb/source-files.c     |  2 +-
 odb/source-inmemory.c  |  2 +-
 odb/source-loose.c     |  4 ++--
 odb/source.h           |  2 +-
 sha1dc_git.c           |  3 +--
 sha1dc_git.h           |  2 +-
 t/t1007-hash-object.sh | 39 +++++++++++++++++++++++++++++++++++++++
 9 files changed, 56 insertions(+), 18 deletions(-)


base-commit: 700432b2ba22603a0bcb71475c9c333d17c9b0d1
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2138%2Fdscho%2FPhilipOakley%2Fhashliteral_t-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2138/dscho/PhilipOakley/hashliteral_t-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/2138

Range-diff vs v1:

 1:  84e1cd0aa0 = 1:  9c01bac407 hash-object: demonstrate a >4GB/LLP64 problem
 2:  809d83e46f ! 2:  aa5859c14f object-file.c: use size_t for header lengths
     @@ Commit message
          Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
      
       ## object-file.c ##
     -@@ object-file.c: int odb_source_loose_read_object_info(struct odb_source *source,
     +@@ object-file.c: int parse_loose_header(const char *hdr, struct object_info *oi)
       static void hash_object_body(const struct git_hash_algo *algo, struct git_hash_ctx *c,
       			     const void *buf, unsigned long len,
       			     struct object_id *oid,
     @@ object-file.c: int odb_source_loose_read_object_info(struct odb_source *source,
      @@ object-file.c: static void hash_object_body(const struct git_hash_algo *algo, struct git_hash_c
       }
       
     - static void write_object_file_prepare(const struct git_hash_algo *algo,
     --				      const void *buf, unsigned long len,
     -+				      const void *buf, size_t len,
     - 				      enum object_type type, struct object_id *oid,
     --				      char *hdr, int *hdrlen)
     -+				      char *hdr, size_t *hdrlen)
     + void write_object_file_prepare(const struct git_hash_algo *algo,
     +-			       const void *buf, unsigned long len,
     ++			       const void *buf, size_t len,
     + 			       enum object_type type, struct object_id *oid,
     +-			       char *hdr, int *hdrlen)
     ++			       char *hdr, size_t *hdrlen)
       {
       	struct git_hash_ctx c;
       
     @@ object-file.c: out:
       
       	write_object_file_prepare(algo, buf, len, type, oid, hdr, &hdrlen);
       }
     -@@ object-file.c: cleanup:
     +
     + ## object-file.h ##
     +@@ object-file.h: int finalize_object_file_flags(struct repository *repo,
     + 			       enum finalize_object_file_flags flags);
     + 
     + void hash_object_file(const struct git_hash_algo *algo, const void *buf,
     +-		      unsigned long len, enum object_type type,
     ++		      size_t len, enum object_type type,
     + 		      struct object_id *oid);
     + void write_object_file_prepare(const struct git_hash_algo *algo,
     +-			       const void *buf, unsigned long len,
     ++			       const void *buf, size_t len,
     + 			       enum object_type type, struct object_id *oid,
     +-			       char *hdr, int *hdrlen);
     ++			       char *hdr, size_t *hdrlen);
     + int write_loose_object(struct odb_source_loose *loose,
     + 		       const struct object_id *oid, char *hdr,
     + 		       int hdrlen, const void *buf, unsigned long len,
     +
     + ## odb/source-files.c ##
     +@@ odb/source-files.c: static int odb_source_files_freshen_object(struct odb_source *source,
     + }
     + 
     + static int odb_source_files_write_object(struct odb_source *source,
     +-					 const void *buf, unsigned long len,
     ++					 const void *buf, size_t len,
     + 					 enum object_type type,
     + 					 struct object_id *oid,
     + 					 struct object_id *compat_oid,
     +
     + ## odb/source-inmemory.c ##
     +@@ odb/source-inmemory.c: static int odb_source_inmemory_count_objects(struct odb_source *source,
       }
       
     - int odb_source_loose_write_object(struct odb_source *source,
     --				  const void *buf, unsigned long len,
     -+				  const void *buf, size_t len,
     - 				  enum object_type type, struct object_id *oid,
     - 				  struct object_id *compat_oid_in,
     - 				  enum odb_write_object_flags flags)
     -@@ object-file.c: int odb_source_loose_write_object(struct odb_source *source,
     + static int odb_source_inmemory_write_object(struct odb_source *source,
     +-					    const void *buf, unsigned long len,
     ++					    const void *buf, size_t len,
     + 					    enum object_type type,
     + 					    struct object_id *oid,
     + 					    struct object_id *compat_oid UNUSED,
     +
     + ## odb/source-loose.c ##
     +@@ odb/source-loose.c: static int odb_source_loose_freshen_object(struct odb_source *source,
     + }
     + 
     + static int odb_source_loose_write_object(struct odb_source *source,
     +-					 const void *buf, unsigned long len,
     ++					 const void *buf, size_t len,
     + 					 enum object_type type, struct object_id *oid,
     + 					 struct object_id *compat_oid_in,
     + 					 enum odb_write_object_flags flags)
     +@@ odb/source-loose.c: static int odb_source_loose_write_object(struct odb_source *source,
       	const struct git_hash_algo *compat = source->odb->repo->compat_hash_algo;
       	struct object_id compat_oid;
       	char hdr[MAX_HEADER_LEN];
     @@ object-file.c: int odb_source_loose_write_object(struct odb_source *source,
       	/* Generate compat_oid */
       	if (compat) {
      
     - ## object-file.h ##
     -@@ object-file.h: int odb_source_loose_freshen_object(struct odb_source *source,
     - 				    const struct object_id *oid);
     - 
     - int odb_source_loose_write_object(struct odb_source *source,
     --				  const void *buf, unsigned long len,
     -+				  const void *buf, size_t len,
     - 				  enum object_type type, struct object_id *oid,
     - 				  struct object_id *compat_oid_in,
     - 				  enum odb_write_object_flags flags);
     -@@ object-file.h: int finalize_object_file_flags(struct repository *repo,
     - 			       enum finalize_object_file_flags flags);
     - 
     - void hash_object_file(const struct git_hash_algo *algo, const void *buf,
     --		      unsigned long len, enum object_type type,
     -+		      size_t len, enum object_type type,
     - 		      struct object_id *oid);
     - 
     - /* Helper to check and "touch" a file */
     + ## odb/source.h ##
     +@@ odb/source.h: struct odb_source {
     + 	 * return 0 on success, a negative error code otherwise.
     + 	 */
     + 	int (*write_object)(struct odb_source *source,
     +-			    const void *buf, unsigned long len,
     ++			    const void *buf, size_t len,
     + 			    enum object_type type,
     + 			    struct object_id *oid,
     + 			    struct object_id *compat_oid,
 3:  253d6f8004 ! 3:  b401eb490f hash algorithms: use size_t for section lengths
     @@ Commit message
          Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
      
       ## object-file.c ##
     -@@ object-file.c: int odb_source_loose_read_object_info(struct odb_source *source,
     +@@ object-file.c: int parse_loose_header(const char *hdr, struct object_info *oi)
       }
       
       static void hash_object_body(const struct git_hash_algo *algo, struct git_hash_ctx *c,
     @@ object-file.c: int odb_source_loose_read_object_info(struct odb_source *source,
       			     struct object_id *oid,
       			     char *hdr, size_t *hdrlen)
       {
     -@@ object-file.c: static void write_object_file_prepare(const struct git_hash_algo *algo,
     +@@ object-file.c: void write_object_file_prepare(const struct git_hash_algo *algo,
       	/* Generate the header */
       	*hdrlen = format_object_header(hdr, *hdrlen, type, len);
       
     @@ t/t1007-hash-object.sh: test_expect_success '--stdin outside of repository (uses
       '
       
      -test_expect_failure EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
     -+test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
     ++test_expect_success EXPENSIVE,SIZE_T_IS_64BIT \
       		'files over 4GB hash literally' '
       	test-tool genzeros $((5*1024*1024*1024)) >big &&
       	test_oid large5GB >expect &&
 4:  ba629a3f03 ! 4:  411727336a hash-object --stdin: verify that it works with >4GB/LLP64
     @@ Commit message
          Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
      
       ## t/t1007-hash-object.sh ##
     -@@ t/t1007-hash-object.sh: test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
     +@@ t/t1007-hash-object.sh: test_expect_success EXPENSIVE,SIZE_T_IS_64BIT \
       	test_cmp expect actual
       '
       
     -+test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
     ++test_expect_success EXPENSIVE,SIZE_T_IS_64BIT \
      +		'files over 4GB hash correctly via --stdin' '
      +	{ test -f big || test-tool genzeros $((5*1024*1024*1024)) >big; } &&
      +	test_oid large5GB >expect &&
 5:  f48d570bba ! 5:  e6bb4e6228 hash-object: add another >4GB/LLP64 test case
     @@ Commit message
          Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
      
       ## t/t1007-hash-object.sh ##
     -@@ t/t1007-hash-object.sh: test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
     +@@ t/t1007-hash-object.sh: test_expect_success EXPENSIVE,SIZE_T_IS_64BIT \
       	test_cmp expect actual
       '
       
     -+test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
     ++test_expect_success EXPENSIVE,SIZE_T_IS_64BIT \
      +		'files over 4GB hash correctly' '
      +	{ test -f big || test-tool genzeros $((5*1024*1024*1024)) >big; } &&
      +	test_oid large5GB >expect &&
 6:  8a6beeb16d ! 6:  568807ac34 hash-object: add a >4GB/LLP64 test case using filtered input
     @@ Commit message
          Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
      
       ## t/t1007-hash-object.sh ##
     -@@ t/t1007-hash-object.sh: test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
     +@@ t/t1007-hash-object.sh: test_expect_success EXPENSIVE,SIZE_T_IS_64BIT \
       	test_cmp expect actual
       '
       
      +# This clean filter does nothing, other than excercising the interface.
      +# We ensure that cleaning doesn't mangle large files on 64-bit Windows.
     -+test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
     ++test_expect_success EXPENSIVE,SIZE_T_IS_64BIT \
      +		'hash filtered files over 4GB correctly' '
      +	{ test -f big || test-tool genzeros $((5*1024*1024*1024)) >big; } &&
      +	test_oid large5GB >expect &&

-- 
gitgitgadget

^ permalink raw reply

* Re: [PATCH 3/6] hash algorithms: use size_t for section lengths
From: Johannes Schindelin @ 2026-06-16 14:48 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Philip Oakley via GitGitGadget, git, Philip Oakley
In-Reply-To: <ai-5VmawU2MRiAHQ@pks.im>

Hi Patrick,

On Tue, 16 Jun 2026, Patrick Steinhardt wrote:

> On Thu, Jun 04, 2026 at 05:15:09PM +0000, Philip Oakley via GitGitGadget wrote:
> > diff --git a/object-file.c b/object-file.c
> > index 1f5f9daf24..c648cecd80 100644
> > --- a/object-file.c
> > +++ b/object-file.c
> > @@ -581,7 +581,7 @@ static void write_object_file_prepare(const struct git_hash_algo *algo,
> >  	/* Generate the header */
> >  	*hdrlen = format_object_header(hdr, *hdrlen, type, len);
> >  
> > -	/* Sha1.. */
> > +	/* Hash (function pointers) computation */
> >  	hash_object_body(algo, &c, buf, len, oid, hdr, hdrlen);
> >  }
> >  
> 
> Thanks for updating this comment while at it :)

It wasn't my idea, it was Claude Opus'. I would have left it as-is, but
then decided that it's actually a good change and not worth splitting out
into a separate commit.

> > diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
> > index 7867fd1dbf..10382a815e 100755
> > --- a/t/t1007-hash-object.sh
> > +++ b/t/t1007-hash-object.sh
> > @@ -261,7 +261,7 @@ test_expect_success '--stdin outside of repository (uses default hash)' '
> >  	test_cmp expect actual
> >  '
> >  
> > -test_expect_failure EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
> > +test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
> >  		'files over 4GB hash literally' '
> >  	test-tool genzeros $((5*1024*1024*1024)) >big &&
> >  	test_oid large5GB >expect &&
> 
> Previously we required `!LONG_IS_64BIT`, because the test would have
> succeeded on platforms where it is 64 bit wide. But now that this test
> works on all platforms I rather wonder whether we should completely drop
> that prerequisite here, as we expect it to pass regardless of whether or
> not long is 64 bit now.

Good point!

Thank you for the review,
Johannes

^ permalink raw reply

* Re: [PATCH 5/6] hash-object: add another >4GB/LLP64 test case
From: Johannes Schindelin @ 2026-06-16 14:48 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Philip Oakley via GitGitGadget, git, Philip Oakley
In-Reply-To: <ai-5ZvDgc4smGfGc@pks.im>

Hi Patrick,

On Tue, 16 Jun 2026, Patrick Steinhardt wrote:

> On Thu, Jun 04, 2026 at 05:15:11PM +0000, Philip Oakley via GitGitGadget wrote:
> > diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
> > index 59efee3aff..f2722380ee 100755
> > --- a/t/t1007-hash-object.sh
> > +++ b/t/t1007-hash-object.sh
> > @@ -277,4 +277,12 @@ test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
> >  	test_cmp expect actual
> >  '
> >  
> > +test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
> > +		'files over 4GB hash correctly' '
> > +	{ test -f big || test-tool genzeros $((5*1024*1024*1024)) >big; } &&
> > +	test_oid large5GB >expect &&
> > +	git hash-object -- big >actual &&
> > +	test_cmp expect actual
> > +'
> 
> Same comment here.

[Comment was the suggestion to drop the !LONG_IS_64BIT prerequisite]

Same comment here. [My reply: Good point!]

> Nit: I feel like we could've easily introduced all of these tests in the
> first commit.

Sure, but I actually liked the structuring by Philip when I accepted the
patches into Git for Windows, and I still do: The commits all have
slightly different concerns, and I love that the cognitive load is
lightened somewhat by keeping those concerns in separate patches with
separate commit messages.

Ciao,
Johannes

^ permalink raw reply

* Re: [PATCH 2/6] object-file.c: use size_t for header lengths
From: Johannes Schindelin @ 2026-06-16 14:48 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Philip Oakley via GitGitGadget, git, Philip Oakley
In-Reply-To: <ai-5XO9gsc_HdMFX@pks.im>

Hi Patrick,

On Mon, 15 Jun 2026, Patrick Steinhardt wrote:

> On Thu, Jun 04, 2026 at 05:15:08PM +0000, Philip Oakley via GitGitGadget wrote:
> > From: Philip Oakley <philipoakley@iee.email>
> > 
> > Continue walking the code path for the >4GB `hash-object --literally`
> > test. The `hash_object_file_literally()` function internally uses both
> > `hash_object_file()` and `write_object_file_prepare()`. Both function
> > signatures use `unsigned long` rather than `size_t` for the mem buffer
> > sizes. Use `size_t` instead, for LLP64 compatibility.
> > 
> > While at it, convert those function's object's header buffer length to
> > `size_t` for consistency. The value is already upcast to `uintmax_t` for
> > print format compatibility.
> 
> One thing I was wondering is whether we should rather migrate to a size
> that is consistent across different platforms. We could e.g. `typedef
> uint64_t objsize_t` and then use that going forward.

No, the point of `size_t` is to represent what the current platform can
handle in-memory. That cannot (and should not) be consolidated.

> I guess the question though is whether that'd buy us anything. In other
> words, are there any platforms that we care about where `size_t` is only
> 32 bit wide? And would such platforms even be able to handle such large
> objects?

There are ways to handle objects larger than 4GB on 32-bit platforms, via
streaming. In those cases, what you need is `off_t`, not `size_t`.

Obviously, there is a large class of problems with such setups. For
example, you can forget about efficiently reconstructing a large Git
object from a delta chain. If you cannot do that in-memory, trying to work
around that limitation merely opens up the user for a world of pain.

Ciao,
Johannes

^ permalink raw reply

* Re: [PATCH v3] read_gitfile(): simplify NOT_A_REPO error message
From: Jeff King @ 2026-06-16 14:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Patrick Steinhardt, git, Tian Yuchen
In-Reply-To: <xmqq7bnya7gh.fsf@gitster.g>

On Tue, Jun 16, 2026 at 07:25:02AM -0700, Junio C Hamano wrote:

> >     +@@ t/t7450-bad-git-dotfiles.sh: test_expect_success 'git dirs of sibling submodules must not be nested' '
> >     + test_expect_success 'submodule git dir nesting detection must work with parallel cloning' '
> >     + 	test_must_fail git clone --recurse-submodules --jobs=2 nested clone_parallel 2>err &&
> >     + 	cat err &&
> >     +-	grep -E "(already exists|is inside git dir|not a git repository)" err &&
> >     ++	grep -E "(already exists|is inside git dir|does not point to a valid repository)" err &&
> 
> A few things.
> 
>  * Will we be happy to see only one of these possibilities, or do we
>    expect to see these once for each kind?

I imagine it is only one. This all comes from 9cf8547320 (clone: prevent
clashing git dirs when cloning submodule in parallel, 2024-01-28), and
it is expecting the nested path to cause a failure. Which failure I
guess depends on the racy ordering. If we create the inner one first,
then we probably get "already exists", and if the outer one, then "is
inside git dir". I don't know exactly what sequence yields the
NOT_A_REPO message.

But none of that is changing in this patch, just what the user-visible
text is for the NOT_A_REPO case.

I did briefly wonder if we might see "not a git repository" from a
_different_ code path, and need to catch it along with the new message.
But running successfully with --stress implies that we never see the old
one anymore.

>  * a recently started in-flight topic tries to catch bare "grep" and
>    fails until you write test_grep X-<.

Yeah. This will create a merge conflict for you, but hopefully the
resolution should be obvious. I don't think it makes sense to fix here,
as it's orthogonal to the purpose of the patch.

-Peff

^ permalink raw reply

* Re: [PATCH v3] read_gitfile(): simplify NOT_A_REPO error message
From: Junio C Hamano @ 2026-06-16 14:25 UTC (permalink / raw)
  To: Jeff King; +Cc: Patrick Steinhardt, git, Tian Yuchen
In-Reply-To: <20260616123516.GA2301231@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Tue, Jun 16, 2026 at 07:19:20AM -0400, Jeff King wrote:
>
>> Here it is.

Thanks.

>     +@@ t/t7450-bad-git-dotfiles.sh: test_expect_success 'git dirs of sibling submodules must not be nested' '
>     + test_expect_success 'submodule git dir nesting detection must work with parallel cloning' '
>     + 	test_must_fail git clone --recurse-submodules --jobs=2 nested clone_parallel 2>err &&
>     + 	cat err &&
>     +-	grep -E "(already exists|is inside git dir|not a git repository)" err &&
>     ++	grep -E "(already exists|is inside git dir|does not point to a valid repository)" err &&

A few things.

 * Will we be happy to see only one of these possibilities, or do we
   expect to see these once for each kind?

 * a recently started in-flight topic tries to catch bare "grep" and
   fails until you write test_grep X-<.

> We also racily trigger this in t7450. During parallel cloning we might
> see one of several errors, including this one. And so we must update
> that message, too (you can otherwise find the failure pretty quickly by
> running t7450 with --stress).


^ permalink raw reply

* Re: [PATCH] Add a test about broken notes handling on rebase
From: Phillip Wood @ 2026-06-16 13:12 UTC (permalink / raw)
  To: Uwe Kleine-König, git
In-Reply-To: <20260612143952.3281115-2-u.kleine-koenig@baylibre.com>

On 12/06/2026 15:39, Uwe Kleine-König wrote:
> When a commit disappears during rebase because the patch content is
> already there (but not by the same patch in which case the commit would
> be skipped) the notes of that disappearing commit still survives and is
> added to the (rebased) parent of the disappearing commit.
> 
> So with the commit graph
> 
>   A -- B -- C
>    `
>     `-BD
> 
> where BD includes the changes done in B, when rebasing C on top of BD,
> the note for B should disappear and not be added to BD.
> 
> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
> ---
> Hello,
> 
> this is a behaviour of git that really bothers me when working on big
> patch series. I use notes to track the Message-Id of the patches when I
> send them out. Then when rebasing to a newer upstream version, the
> tracking gets confused because the Message-Id notes end up on commits
> that were not sent out yet (or I got two Message-Ids in them).
> 
> I reported that already back in 2023[1],

That thread includes a suggestion on how to fix it if anyone reading 
this is interesting in working on it.

> but obviously not in a way that
> resulted in a fix. So I'm trying again with a patch that adds a failing
> test.

I'm not sure carrying this test makes it any more likely that it will be 
fixed, though your mail might get someone interested in fixing it. Don't 
we already have some relevant tests t3400 rather than adding a whole new 
file for a single test?

Thanks

Phillip

> Best regards
> Uwe
> 
> [1] https://lore.kernel.org/git/20230530092155.3zbb5uxa7eisdzxb@pengutronix.de/
> 
>   t/meson.build           |  1 +
>   t/t3322-notes-rebase.sh | 35 +++++++++++++++++++++++++++++++++++
>   2 files changed, 36 insertions(+)
>   create mode 100644 t/t3322-notes-rebase.sh
> 
> diff --git a/t/meson.build b/t/meson.build
> index c5832fee0535..6927bd9c794f 100644
> --- a/t/meson.build
> +++ b/t/meson.build
> @@ -358,6 +358,7 @@ integration_tests = [
>     't3311-notes-merge-fanout.sh',
>     't3320-notes-merge-worktrees.sh',
>     't3321-notes-stripspace.sh',
> +  't3322-notes-rebase.sh',
>     't3400-rebase.sh',
>     't3401-rebase-and-am-rename.sh',
>     't3402-rebase-merge.sh',
> diff --git a/t/t3322-notes-rebase.sh b/t/t3322-notes-rebase.sh
> new file mode 100644
> index 000000000000..64c40a523b50
> --- /dev/null
> +++ b/t/t3322-notes-rebase.sh
> @@ -0,0 +1,35 @@
> +#!/bin/sh
> +
> +test_description='Test notes on rebase'
> +
> +. ./test-lib.sh
> +
> +test_expect_success setup '
> +	git init &&
> +	echo A > A &&
> +	git add A &&
> +	git commit -m A &&
> +	git branch branch &&
> +	echo B > B &&
> +	git add B &&
> +	git commit -m B &&
> +	git notes add -m "This is B" @ &&
> +	echo C > C &&
> +	git add C &&
> +	git commit -m C &&
> +	git checkout branch &&
> +	echo B > B &&
> +	echo D > D &&
> +	git add B D &&
> +	git commit -m BD
> +'
> +
> +test_expect_success 'rebase B + C on top of BD' '
> +	git rebase @ master
> +'
> +
> +test_expect_failure 'assert there is no note on BD' '
> +	git notes show branch
> +'
> +
> +test_done
> 
> base-commit: 3e65291872de10c3f0bf05ea8c24187e7a71ebf0


^ permalink raw reply

* Re: [GSoC Patch v5 2/4] rev-parse: use append_formatted_path() for path formatting
From: Phillip Wood @ 2026-06-16 13:08 UTC (permalink / raw)
  To: K Jayatheerth, git
  Cc: jltobler, lucasseikioshiro, gitster, phillip.wood, sandals,
	kumarayushjha123, a3205153416
In-Reply-To: <20260616044953.184806-3-jayatheerthkulkarni2005@gmail.com>

On 16/06/2026 05:49, K Jayatheerth wrote:
> -static void print_path(const char *path, const char *prefix, enum format_type format, enum default_type def)
> +static void print_path(const char *path, const char *prefix,
> +		       enum path_format arg_path_format, enum path_format def_format)
>   {
> -	char *cwd = NULL;
> -	/*
> -	 * We don't ever produce a relative path if prefix is NULL, so set the
> -	 * prefix to the current directory so that we can produce a relative
> -	 * path whenever possible.  If we're using RELATIVE_IF_SHARED mode, then
> -	 * we want an absolute path unless the two share a common prefix, so don't
> -	 * set it in that case, since doing so causes a relative path to always
> -	 * be produced if possible.
> -	 */
> -	if (!prefix && (format != FORMAT_DEFAULT || def != DEFAULT_RELATIVE_IF_SHARED))
> -		prefix = cwd = xgetcwd();
> -	if (format == FORMAT_DEFAULT && def == DEFAULT_UNMODIFIED) {
> -		puts(path);
> -	} else if (format == FORMAT_RELATIVE ||
> -		  (format == FORMAT_DEFAULT && def == DEFAULT_RELATIVE)) {
> -		/*
> -		 * In order for relative_path to work as expected, we need to
> -		 * make sure that both paths are absolute paths.  If we don't,
> -		 * we can end up with an unexpected absolute path that the user
> -		 * didn't want.
> -		 */
> -		struct strbuf buf = STRBUF_INIT, realbuf = STRBUF_INIT, prefixbuf = STRBUF_INIT;
> -		if (!is_absolute_path(path)) {
> -			strbuf_realpath_forgiving(&realbuf, path,  1);
> -			path = realbuf.buf;
> -		}
> -		if (!is_absolute_path(prefix)) {
> -			strbuf_realpath_forgiving(&prefixbuf, prefix, 1);
> -			prefix = prefixbuf.buf;
> -		}
> -		puts(relative_path(path, prefix, &buf));
> -		strbuf_release(&buf);
> -		strbuf_release(&realbuf);
> -		strbuf_release(&prefixbuf);
> -	} else if (format == FORMAT_DEFAULT && def == DEFAULT_RELATIVE_IF_SHARED) {
> -		struct strbuf buf = STRBUF_INIT;
> -		puts(relative_path(path, prefix, &buf));
> -		strbuf_release(&buf);
> -	} else {
> -		struct strbuf buf = STRBUF_INIT;
> -		strbuf_realpath_forgiving(&buf, path, 1);
> -		puts(buf.buf);
> -		strbuf_release(&buf);
> -	}
> -	free(cwd);
> +	struct strbuf sb = STRBUF_INIT;
> +	/* If the user didn't explicitly specify a format, fallback to the path-specific default. */
> +	enum path_format fmt = (arg_path_format != PATH_FORMAT_DEFAULT) ? arg_path_format : def_format;
> +
> +	append_formatted_path(&sb, path, prefix, fmt);
> +	puts(sb.buf);
> +
> +	strbuf_release(&sb);
>   }
>   
>   int cmd_rev_parse(int argc,
> @@ -717,7 +661,7 @@ int cmd_rev_parse(int argc,
>   	const char *name = NULL;
>   	struct strbuf buf = STRBUF_INIT;
>   	int seen_end_of_options = 0;
> -	enum format_type format = FORMAT_DEFAULT;
> +	enum path_format arg_path_format = PATH_FORMAT_DEFAULT;

This is the source of the api wart I referred to in the previous patch. 
Could we keep the existing enums and convert them into the appropriate 
PATH_FORMAT_* flag in print_path() above? I think we already have the 
logic to do that in the existing code. That would mean that other users 
of append_formatted_path() don't have to worry about the extra flag.

Thanks

Phillip

>   
>   	show_usage_if_asked(argc, argv, builtin_rev_parse_usage);
>   
> @@ -797,8 +741,8 @@ int cmd_rev_parse(int argc,
>   					die(_("--git-path requires an argument"));
>   				print_path(repo_git_path_replace(the_repository, &buf,
>   								 "%s", argv[i + 1]), prefix,
> -						format,
> -						DEFAULT_RELATIVE_IF_SHARED);
> +						arg_path_format,
> +						PATH_FORMAT_RELATIVE_IF_SHARED);
>   				i++;
>   				continue;
>   			}
> @@ -820,9 +764,9 @@ int cmd_rev_parse(int argc,
>   				if (!arg)
>   					die(_("--path-format requires an argument"));
>   				if (!strcmp(arg, "absolute")) {
> -					format = FORMAT_CANONICAL;
> +					arg_path_format = PATH_FORMAT_CANONICAL;
>   				} else if (!strcmp(arg, "relative")) {
> -					format = FORMAT_RELATIVE;
> +					arg_path_format = PATH_FORMAT_RELATIVE;
>   				} else {
>   					die(_("unknown argument to --path-format: %s"), arg);
>   				}
> @@ -985,7 +929,7 @@ int cmd_rev_parse(int argc,
>   			if (!strcmp(arg, "--show-toplevel")) {
>   				const char *work_tree = repo_get_work_tree(the_repository);
>   				if (work_tree)
> -					print_path(work_tree, prefix, format, DEFAULT_UNMODIFIED);
> +					print_path(work_tree, prefix, arg_path_format, PATH_FORMAT_UNMODIFIED);
>   				else
>   					die(_("this operation must be run in a work tree"));
>   				continue;
> @@ -993,7 +937,7 @@ int cmd_rev_parse(int argc,
>   			if (!strcmp(arg, "--show-superproject-working-tree")) {
>   				struct strbuf superproject = STRBUF_INIT;
>   				if (get_superproject_working_tree(&superproject))
> -					print_path(superproject.buf, prefix, format, DEFAULT_UNMODIFIED);
> +					print_path(superproject.buf, prefix, arg_path_format, PATH_FORMAT_UNMODIFIED);
>   				strbuf_release(&superproject);
>   				continue;
>   			}
> @@ -1028,18 +972,18 @@ int cmd_rev_parse(int argc,
>   				const char *gitdir = getenv(GIT_DIR_ENVIRONMENT);
>   				char *cwd;
>   				int len;
> -				enum format_type wanted = format;
> +				enum path_format wanted = arg_path_format;
>   				if (arg[2] == 'g') {	/* --git-dir */
>   					if (gitdir) {
> -						print_path(gitdir, prefix, format, DEFAULT_UNMODIFIED);
> +						print_path(gitdir, prefix, arg_path_format, PATH_FORMAT_UNMODIFIED);
>   						continue;
>   					}
>   					if (!prefix) {
> -						print_path(".git", prefix, format, DEFAULT_UNMODIFIED);
> +						print_path(".git", prefix, arg_path_format, PATH_FORMAT_UNMODIFIED);
>   						continue;
>   					}
>   				} else {		/* --absolute-git-dir */
> -					wanted = FORMAT_CANONICAL;
> +					wanted = PATH_FORMAT_CANONICAL;
>   					if (!gitdir && !prefix)
>   						gitdir = ".git";
>   					if (gitdir) {
> @@ -1055,11 +999,11 @@ int cmd_rev_parse(int argc,
>   				strbuf_reset(&buf);
>   				strbuf_addf(&buf, "%s%s.git", cwd, len && cwd[len-1] != '/' ? "/" : "");
>   				free(cwd);
> -				print_path(buf.buf, prefix, wanted, DEFAULT_CANONICAL);
> +				print_path(buf.buf, prefix, wanted, PATH_FORMAT_CANONICAL);
>   				continue;
>   			}
>   			if (!strcmp(arg, "--git-common-dir")) {
> -				print_path(repo_get_common_dir(the_repository), prefix, format, DEFAULT_RELATIVE_IF_SHARED);
> +				print_path(repo_get_common_dir(the_repository), prefix, arg_path_format, PATH_FORMAT_RELATIVE_IF_SHARED);
>   				continue;
>   			}
>   			if (!strcmp(arg, "--is-inside-git-dir")) {
> @@ -1089,7 +1033,7 @@ int cmd_rev_parse(int argc,
>   				if (the_repository->index->split_index) {
>   					const struct object_id *oid = &the_repository->index->split_index->base_oid;
>   					const char *path = repo_git_path_replace(the_repository, &buf, "sharedindex.%s", oid_to_hex(oid));
> -					print_path(path, prefix, format, DEFAULT_RELATIVE);
> +					print_path(path, prefix, arg_path_format, PATH_FORMAT_RELATIVE);
>   				}
>   				continue;
>   			}


^ permalink raw reply

* Re: [GSoC Patch v5 1/4] path: introduce append_formatted_path() for shared path formatting
From: Phillip Wood @ 2026-06-16 13:08 UTC (permalink / raw)
  To: K Jayatheerth, git
  Cc: jltobler, lucasseikioshiro, gitster, phillip.wood, sandals,
	kumarayushjha123, a3205153416
In-Reply-To: <20260616044953.184806-2-jayatheerthkulkarni2005@gmail.com>

On 16/06/2026 05:49, K Jayatheerth wrote:
> The path-formatting logic in builtin/rev-parse.c is tightly coupled
> to that command and writes directly to stdout, making it impossible
> for other builtins to reuse.
> 
> Extract the core algorithm into append_formatted_path() in path.c
> and expose a path_format enum in path.h so that any builtin can
> format paths consistently without duplicating logic.

Sorry I haven't had time to look at this series recently, it is looking 
much nicer now that we have a single enum. It would be helpful to 
explain why we need PATH_FORMAT_DEFAULT that acts exactly like 
PATH_FORMAT_UNMODIFIED. Looking at the next patch it seems this is still 
a wart in the api due to rev-parse wanting needing to distinguish the 
unmodified case from the default case.

Thanks

Phillip

> Mentored-by: Justin Tobler <jltobler@gmail.com>
> Mentored-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
> Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com>
> ---
>   path.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   path.h | 36 ++++++++++++++++++++++++++++++
>   2 files changed, 106 insertions(+)
> 
> diff --git a/path.c b/path.c
> index d7e17bf174..5e83e3e4f6 100644
> --- a/path.c
> +++ b/path.c
> @@ -1579,6 +1579,76 @@ char *xdg_cache_home(const char *filename)
>   	return NULL;
>   }
>   
> +void append_formatted_path(struct strbuf *dest, const char *path,
> +			   const char *prefix, enum path_format format)
> +{
> +	switch (format) {
> +	case PATH_FORMAT_DEFAULT:
> +	case PATH_FORMAT_UNMODIFIED:
> +		strbuf_addstr(dest, path);
> +		break;
> +
> +	case PATH_FORMAT_RELATIVE: {
> +		struct strbuf relative_buf = STRBUF_INIT;
> +		struct strbuf real_path = STRBUF_INIT;
> +		struct strbuf real_prefix = STRBUF_INIT;
> +		char *cwd = NULL;
> +
> +		/*
> +		 * We don't ever produce a relative path if prefix is NULL,
> +		 * so set the prefix to the current directory so that we can
> +		 * produce a relative path whenever possible.
> +		 */
> +		if (!prefix)
> +			prefix = cwd = xgetcwd();
> +
> +		if (!is_absolute_path(path)) {
> +			strbuf_realpath_forgiving(&real_path, path, 1);
> +			path = real_path.buf;
> +		}
> +		if (!is_absolute_path(prefix)) {
> +			strbuf_realpath_forgiving(&real_prefix, prefix, 1);
> +			prefix = real_prefix.buf;
> +		}
> +
> +		strbuf_addstr(dest, relative_path(path, prefix, &relative_buf));
> +
> +		strbuf_release(&relative_buf);
> +		strbuf_release(&real_path);
> +		strbuf_release(&real_prefix);
> +		free(cwd);
> +		break;
> +	}
> +
> +	case PATH_FORMAT_RELATIVE_IF_SHARED: {
> +		struct strbuf relative_buf = STRBUF_INIT;
> +
> +		/*
> +		 * If we're using RELATIVE_IF_SHARED mode, then we want an
> +		 * absolute path unless the two share a common prefix, so don't
> +		 * default the prefix to the current working directory. Doing so
> +		 * would cause a relative path to always be produced if possible.
> +		 */
> +		strbuf_addstr(dest, relative_path(path, prefix, &relative_buf));
> +		strbuf_release(&relative_buf);
> +		break;
> +	}
> +
> +	case PATH_FORMAT_CANONICAL: {
> +		struct strbuf canonical_buf = STRBUF_INIT;
> +
> +		strbuf_realpath_forgiving(&canonical_buf, path, 1);
> +		strbuf_addbuf(dest, &canonical_buf);
> +
> +		strbuf_release(&canonical_buf);
> +		break;
> +	}
> +
> +	default:
> +		BUG("unknown path_format value %d", format);
> +	}
> +}
> +
>   REPO_GIT_PATH_FUNC(squash_msg, "SQUASH_MSG")
>   REPO_GIT_PATH_FUNC(merge_msg, "MERGE_MSG")
>   REPO_GIT_PATH_FUNC(merge_rr, "MERGE_RR")
> diff --git a/path.h b/path.h
> index 0434ba5e07..6aca53b100 100644
> --- a/path.h
> +++ b/path.h
> @@ -262,6 +262,42 @@ enum scld_error safe_create_leading_directories_no_share(char *path);
>   int safe_create_file_with_leading_directories(struct repository *repo,
>   					      const char *path);
>   
> +/**
> + * The formatting strategy to apply when writing a path into a buffer.
> + */
> +enum path_format {
> +	/*
> +	 * Represents the default formatting behavior. Treated as
> +	 * PATH_FORMAT_UNMODIFIED by append_formatted_path().
> +	 */
> +	PATH_FORMAT_DEFAULT,
> +
> +	/* Output the path exactly as-is without any modifications. */
> +	PATH_FORMAT_UNMODIFIED,
> +
> +	/* Output a path relative to the provided directory prefix. */
> +	PATH_FORMAT_RELATIVE,
> +
> +	/* Output a relative path only if the path shares a root with the prefix. */
> +	PATH_FORMAT_RELATIVE_IF_SHARED,
> +
> +	/* Output a fully resolved, absolute canonical path. */
> +	PATH_FORMAT_CANONICAL
> +};
> +
> +/**
> + * Format a path according to the specified formatting strategy and append
> + * the result to the given strbuf.
> + *
> + * `dest`   : The string buffer to append the formatted path to.
> + * `path`   : The path string that needs to be formatted.
> + * `prefix` : The directory prefix to calculate relative offsets against.
> + * Pass NULL to default to the current working directory where applicable.
> + * `format` : The formatting behavior rule to execute.
> + */
> +void append_formatted_path(struct strbuf *dest, const char *path,
> +			   const char *prefix, enum path_format format);
> +
>   # ifdef USE_THE_REPOSITORY_VARIABLE
>   #  include "strbuf.h"
>   #  include "repository.h"


^ permalink raw reply

* Re: [PATCH v5 2/2] graph: indent visual root in graph
From: Pablo Sabater @ 2026-06-16 13:06 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Taylor Blau, git, ayu.chandekar, chandrapratap3519,
	christian.couder, jltobler, karthik.188, peff, phillip.wood,
	siddharthasthana31
In-Reply-To: <xmqqzf0vbyj8.fsf@gitster.g>

El lun, 15 jun 2026 a las 17:42, Junio C Hamano (<gitster@pobox.com>) escribió:
>
> Pablo Sabater <pabloosabaterr@gmail.com> writes:
>
> > It does not make it unpredictable but it makes it not output what I
> > wanted to test, what I wanted to test is having an active column at
> > the same time that visual roots in different cases were being rendered
> > on another column.
>
> Oh, use of commit-graph changes the traversal order, which would
> affect how the graph is drawn, and there is no way to ensure that we
> traverse in the same way with or without commit-graph?  That's
> inconvenient.  But even without commit-graph, do we guarantee the
> same traversal order forever?  I doubt it.  So I suspect that it is
> a brittle workaround to disable commit-graph in the longer term.

Hi!

About the traversal order, aren't all the graph tests dependent on the
traversal order?  If it changed they would all need to be updated
because the tests are hardcoded expects of the graph.
I guess it might be more brittle than other graph tests specially
because it also depends on removing files, I tried using "git config
core.commitGraph false" or "--date-order" but I still get different
results and removing the files fixed it. If someone knows a better way
of doing it I'm happy to change it.

>
> As long as the graph engine shows correct graph no matter what order
> the commits come out of the revision traversal engine, we won't hurt
> end-users, but we need our tests to be reproducible, so that is a
> bit unfortunate.
>
> Anyway, stepping back a bit,
>
> > However having GIT_TEST_COMMIT_GRAPH in the last
> > text for example changes from:
> >
> > * 41_octopus
> > | * 43_B
> > |  \
> > |   * 43_A
> > | * 42_B
> > | * 42_A
> > * 41_B
> > * 41_A
>
> Does the "vertically aligned * on 2nd and later columns do not mean
> any parent-child relationship" rule no longer apply in this version?
> IOW, does the above graph show that
>
>  - 41_A is a parent of 41_B, which is a parent of 41_octopus
>  - 42_A is a parent of 42_B, and
>  - 43_A is a parent of 43_B but is not related to 42_B

Yes, this means that all commits vertically adjacent are related,
those who are not related and can cause that ambiguity get indented
(43_A).

>
> ?  Who are the parents of 41_octopus?  It has no relationship with
> 42_B and 43_B, and unlike what its name suggests, it has only 41_b
> as its parent (probably with history simplification that makes only
> these commits shown)?

On this test we are using "--first-parent" which excludes all the
parents but the first one, but later we force its excluded parents to
be shown.
We exclude 42_* and 43_* branches and then force them to appear as
unrelated branches.

>
> > to:
> >
> > * 41_octopus
> > * 41_B
> >  \
> >   * 41_A
> > * 43_B
> >  \
> >   * 43_A
> > * 42_B
> > * 42_A
>
> And this graph shows the same inter-commit relationship.  So both
> are correctly showing what we want to express, but they show the
> same information differently, making test_cmp unhappy?

Yes they show the same information.  On the second graph every commit
is on the first column (or second if they get indented) but on the
first graph we have:

*
| * <- visual root on second column
^
`----- first column remains active

If you tested v3 with this case you would see that it assumes that
visual roots only happen to be rendered on the first column, therefore
failing to correctly indent those visual roots on the second column,
which this test proves that they can appear on other columns.

Back to the test:

  * 41_octopus
  | * 43_B
  |  \
  |   * 43_A
  | * 42_B
  | * 42_A
  * 41_B
  * 41_A

43_A is rendered on the second column (first column is active by the
41_* branch) and gets indented to the third one. With commit-graph it
would be on the first and get indented to the second, making it the
same as more general tests above in "t4218", it is an edge case but
shows that indentation works correctly independently where the visual
root is.

>
> Thanks.

Thanks,

Pablo

^ permalink raw reply

* [PATCH v3] read_gitfile(): simplify NOT_A_REPO error message
From: Jeff King @ 2026-06-16 12:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Patrick Steinhardt, git, Tian Yuchen
In-Reply-To: <20260616111919.GC687438@coredump.intra.peff.net>

On Tue, Jun 16, 2026 at 07:19:20AM -0400, Jeff King wrote:

> Here it is.
> 
> -- >8 --
> Subject: read_gitfile(): simplify NOT_A_REPO error message

<sigh> This triggered a failure in CI after passing tests locally.
Turned out to be a race in t7450.

Here's an update, with range-diff.

1:  daf7f99511 ! 1:  67d42141e9 read_gitfile(): simplify NOT_A_REPO error message
    @@ Commit message
         We'll tweak the test to match the new error, but there's no need to beef
         it up further, since we're not showing the pointed-to path at all.
     
    +    We also racily trigger this in t7450. During parallel cloning we might
    +    see one of several errors, including this one. And so we must update
    +    that message, too (you can otherwise find the failure pretty quickly by
    +    running t7450 with --stress).
    +
         Signed-off-by: Jeff King <peff@peff.net>
     
      ## setup.c ##
    @@ t/t0002-gitfile.sh: test_expect_success 'bad setup: invalid .git file format' '
      '
      
      test_expect_success 'final setup + check rev-parse --git-dir' '
    +
    + ## t/t7450-bad-git-dotfiles.sh ##
    +@@ t/t7450-bad-git-dotfiles.sh: test_expect_success 'git dirs of sibling submodules must not be nested' '
    + test_expect_success 'submodule git dir nesting detection must work with parallel cloning' '
    + 	test_must_fail git clone --recurse-submodules --jobs=2 nested clone_parallel 2>err &&
    + 	cat err &&
    +-	grep -E "(already exists|is inside git dir|not a git repository)" err &&
    ++	grep -E "(already exists|is inside git dir|does not point to a valid repository)" err &&
    + 	{
    + 		test_path_is_missing .git/modules/hippo/HEAD ||
    + 		test_path_is_missing .git/modules/hippo/hooks/HEAD

-- >8 --
Subject: [PATCH] read_gitfile(): simplify NOT_A_REPO error message

If a .git file is well-formed but points to a directory that is not
itself a valid repository, then we say:

  fatal: not a git repository: <pointed-to-repo>

without mentioning the .git file that pointed us there in the first
place. Doing so could better help the user understand the source of the
problem.

In theory the most helpful thing we could do is mention both paths,
like:

  gitfile '<gitfile>' points to invalid repository: <pointed-to-repo>

But there's another catch: when we generate the error, we don't always
know the pointed-to repository! This leads to a potential segfault.

The message comes from read_gitfile_error_die(). Originally we only
called that function from inside read_gitfile_gently(), passing in both
the gitfile path and the pointed-to path. But that changed in 1dd27bfbfd
(setup: improve error diagnosis for invalid .git files, 2026-03-04).
Since then, the caller in setup_git_directory_gently(), even if it wants
to die on error, always passes in the "return_error_code" flag, asking
the function to instead return a numeric error code. And then it calls
read_gitfile_error_die() itself, passing NULL for the pointed-to path.

If we get the READ_GITFILE_ERR_NOT_A_REPO code, we form a message using
that NULL pointer, and either segfault or get garbage like "not a git
repository: (null)", depending on the platform.

We could fix this by having the function pass out both the numeric error
code and the pointed-to path. But that creates a new headache: we have
to allocate that string on the heap and pass ownership back to the
caller. So now every caller has to be aware of it (and either free the
result, or signal that they are not interested by using an extra
parameter).

Instead, let's just drop the pointed-to path from the error message
entirely, and mention only the gitfile. This fixes the NULL dereference
without introducing any more complexity. The user-facing error message
is not as detailed as it could be, but is better than the original.
Since it mentions the gitfile, a user investigating the situation can
look there to find the pointed-to path (whereas you could not go the
other way from the original message).

There's an existing test in t0002 which triggers this case, but we
didn't notice the problem because it checks only that we said "not a
repository", and not the full string. So if we print "(null)" it is
happy. It will probably crash on some non-glibc platforms, but nobody
seems to have reported it yet (the breakage is recent-ish as of v2.54).
I'm also somewhat surprised that building with ASan/UBSan doesn't catch
this, but it doesn't seem to (and I found an open issue with somebody
asking for NULL printf checks to be implemented in the sanitizers).

We'll tweak the test to match the new error, but there's no need to beef
it up further, since we're not showing the pointed-to path at all.

We also racily trigger this in t7450. During parallel cloning we might
see one of several errors, including this one. And so we must update
that message, too (you can otherwise find the failure pretty quickly by
running t7450 with --stress).

Signed-off-by: Jeff King <peff@peff.net>
---
 setup.c                     | 9 +++++----
 setup.h                     | 2 +-
 submodule.c                 | 2 +-
 t/t0002-gitfile.sh          | 2 +-
 t/t7450-bad-git-dotfiles.sh | 2 +-
 5 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/setup.c b/setup.c
index b4652651df..b1d9249d91 100644
--- a/setup.c
+++ b/setup.c
@@ -917,7 +917,7 @@ int verify_repository_format(const struct repository_format *format,
 	return 0;
 }
 
-void read_gitfile_error_die(int error_code, const char *path, const char *dir)
+void read_gitfile_error_die(int error_code, const char *path)
 {
 	switch (error_code) {
 	case READ_GITFILE_ERR_NOT_A_FILE:
@@ -937,7 +937,8 @@ void read_gitfile_error_die(int error_code, const char *path, const char *dir)
 	case READ_GITFILE_ERR_NO_PATH:
 		die(_("no path in gitfile: %s"), path);
 	case READ_GITFILE_ERR_NOT_A_REPO:
-		die(_("not a git repository: %s"), dir);
+		die(_("gitfile does not point to a valid repository: %s"),
+		    path);
 	default:
 		BUG("unknown error code");
 	}
@@ -1028,7 +1029,7 @@ const char *read_gitfile_gently(const char *path, int *return_error_code)
 	if (return_error_code)
 		*return_error_code = error_code;
 	else if (error_code)
-		read_gitfile_error_die(error_code, path, dir);
+		read_gitfile_error_die(error_code, path);
 
 	free(buf);
 	return error_code ? NULL : path;
@@ -1629,7 +1630,7 @@ static enum discovery_result setup_git_directory_gently_1(struct strbuf *dir,
 					return GIT_DIR_INVALID_GITFILE;
 			default:
 				if (die_on_error)
-					read_gitfile_error_die(error_code, dir->buf, NULL);
+					read_gitfile_error_die(error_code, dir->buf);
 				else
 					return GIT_DIR_INVALID_GITFILE;
 			}
diff --git a/setup.h b/setup.h
index 705d1d6ff7..df8c93687a 100644
--- a/setup.h
+++ b/setup.h
@@ -38,7 +38,7 @@ int is_nonbare_repository_dir(struct strbuf *path);
 #define READ_GITFILE_ERR_TOO_LARGE 8
 #define READ_GITFILE_ERR_MISSING 9
 #define READ_GITFILE_ERR_IS_A_DIR 10
-void read_gitfile_error_die(int error_code, const char *path, const char *dir);
+void read_gitfile_error_die(int error_code, const char *path);
 const char *read_gitfile_gently(const char *path, int *return_error_code);
 #define read_gitfile(path) read_gitfile_gently((path), NULL)
 const char *resolve_gitdir_gently(const char *suspect, int *return_error_code);
diff --git a/submodule.c b/submodule.c
index fd91201a92..93d0361072 100644
--- a/submodule.c
+++ b/submodule.c
@@ -2579,7 +2579,7 @@ void absorb_git_dir_into_superproject(const char *path,
 
 		if (err_code != READ_GITFILE_ERR_NOT_A_REPO)
 			/* We don't know what broke here. */
-			read_gitfile_error_die(err_code, path, NULL);
+			read_gitfile_error_die(err_code, path);
 
 		/*
 		* Maybe populated, but no git directory was found?
diff --git a/t/t0002-gitfile.sh b/t/t0002-gitfile.sh
index dfbcdddbcc..6356e9ec72 100755
--- a/t/t0002-gitfile.sh
+++ b/t/t0002-gitfile.sh
@@ -27,7 +27,7 @@ test_expect_success 'bad setup: invalid .git file format' '
 test_expect_success 'bad setup: invalid .git file path' '
 	echo "gitdir: $REAL.not" >.git &&
 	test_must_fail git rev-parse 2>.err &&
-	test_grep "not a git repository" .err
+	test_grep "gitfile does not point to a valid repository" .err
 '
 
 test_expect_success 'final setup + check rev-parse --git-dir' '
diff --git a/t/t7450-bad-git-dotfiles.sh b/t/t7450-bad-git-dotfiles.sh
index 8cc86522b2..69a17a9d13 100755
--- a/t/t7450-bad-git-dotfiles.sh
+++ b/t/t7450-bad-git-dotfiles.sh
@@ -350,7 +350,7 @@ test_expect_success 'git dirs of sibling submodules must not be nested' '
 test_expect_success 'submodule git dir nesting detection must work with parallel cloning' '
 	test_must_fail git clone --recurse-submodules --jobs=2 nested clone_parallel 2>err &&
 	cat err &&
-	grep -E "(already exists|is inside git dir|not a git repository)" err &&
+	grep -E "(already exists|is inside git dir|does not point to a valid repository)" err &&
 	{
 		test_path_is_missing .git/modules/hippo/HEAD ||
 		test_path_is_missing .git/modules/hippo/hooks/HEAD
-- 
2.55.0.rc0.346.g4c7eff6ddc


^ permalink raw reply related

* [PATCH v2] read_gitfile(): simplify NOT_A_REPO error message
From: Jeff King @ 2026-06-16 11:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Patrick Steinhardt, git, Tian Yuchen
In-Reply-To: <xmqqeciezh0w.fsf@gitster.g>

On Wed, Jun 10, 2026 at 06:01:03AM -0700, Junio C Hamano wrote:

> > I have to agree that the patch is somewhat gross, and I myself don't
> > really see much of an issue to move to an error message like the above
> > if it ends up simplifying the logic.
> 
> So we are in agreement among three of us that simplifying the code
> to lose error message with dubious value would be a good way
> forward.
> 
> Peff, can we have a formal [v2] then?

Here it is.

-- >8 --
Subject: read_gitfile(): simplify NOT_A_REPO error message

If a .git file is well-formed but points to a directory that is not
itself a valid repository, then we say:

  fatal: not a git repository: <pointed-to-repo>

without mentioning the .git file that pointed us there in the first
place. Doing so could better help the user understand the source of the
problem.

In theory the most helpful thing we could do is mention both paths,
like:

  gitfile '<gitfile>' points to invalid repository: <pointed-to-repo>

But there's another catch: when we generate the error, we don't always
know the pointed-to repository! This leads to a potential segfault.

The message comes from read_gitfile_error_die(). Originally we only
called that function from inside read_gitfile_gently(), passing in both
the gitfile path and the pointed-to path. But that changed in 1dd27bfbfd
(setup: improve error diagnosis for invalid .git files, 2026-03-04).
Since then, the caller in setup_git_directory_gently(), even if it wants
to die on error, always passes in the "return_error_code" flag, asking
the function to instead return a numeric error code. And then it calls
read_gitfile_error_die() itself, passing NULL for the pointed-to path.

If we get the READ_GITFILE_ERR_NOT_A_REPO code, we form a message using
that NULL pointer, and either segfault or get garbage like "not a git
repository: (null)", depending on the platform.

We could fix this by having the function pass out both the numeric error
code and the pointed-to path. But that creates a new headache: we have
to allocate that string on the heap and pass ownership back to the
caller. So now every caller has to be aware of it (and either free the
result, or signal that they are not interested by using an extra
parameter).

Instead, let's just drop the pointed-to path from the error message
entirely, and mention only the gitfile. This fixes the NULL dereference
without introducing any more complexity. The user-facing error message
is not as detailed as it could be, but is better than the original.
Since it mentions the gitfile, a user investigating the situation can
look there to find the pointed-to path (whereas you could not go the
other way from the original message).

There's an existing test in t0002 which triggers this case, but we
didn't notice the problem because it checks only that we said "not a
repository", and not the full string. So if we print "(null)" it is
happy. It will probably crash on some non-glibc platforms, but nobody
seems to have reported it yet (the breakage is recent-ish as of v2.54).
I'm also somewhat surprised that building with ASan/UBSan doesn't catch
this, but it doesn't seem to (and I found an open issue with somebody
asking for NULL printf checks to be implemented in the sanitizers).

We'll tweak the test to match the new error, but there's no need to beef
it up further, since we're not showing the pointed-to path at all.

Signed-off-by: Jeff King <peff@peff.net>
---
 setup.c            | 9 +++++----
 setup.h            | 2 +-
 submodule.c        | 2 +-
 t/t0002-gitfile.sh | 2 +-
 4 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/setup.c b/setup.c
index b4652651df..b1d9249d91 100644
--- a/setup.c
+++ b/setup.c
@@ -917,7 +917,7 @@ int verify_repository_format(const struct repository_format *format,
 	return 0;
 }
 
-void read_gitfile_error_die(int error_code, const char *path, const char *dir)
+void read_gitfile_error_die(int error_code, const char *path)
 {
 	switch (error_code) {
 	case READ_GITFILE_ERR_NOT_A_FILE:
@@ -937,7 +937,8 @@ void read_gitfile_error_die(int error_code, const char *path, const char *dir)
 	case READ_GITFILE_ERR_NO_PATH:
 		die(_("no path in gitfile: %s"), path);
 	case READ_GITFILE_ERR_NOT_A_REPO:
-		die(_("not a git repository: %s"), dir);
+		die(_("gitfile does not point to a valid repository: %s"),
+		    path);
 	default:
 		BUG("unknown error code");
 	}
@@ -1028,7 +1029,7 @@ const char *read_gitfile_gently(const char *path, int *return_error_code)
 	if (return_error_code)
 		*return_error_code = error_code;
 	else if (error_code)
-		read_gitfile_error_die(error_code, path, dir);
+		read_gitfile_error_die(error_code, path);
 
 	free(buf);
 	return error_code ? NULL : path;
@@ -1629,7 +1630,7 @@ static enum discovery_result setup_git_directory_gently_1(struct strbuf *dir,
 					return GIT_DIR_INVALID_GITFILE;
 			default:
 				if (die_on_error)
-					read_gitfile_error_die(error_code, dir->buf, NULL);
+					read_gitfile_error_die(error_code, dir->buf);
 				else
 					return GIT_DIR_INVALID_GITFILE;
 			}
diff --git a/setup.h b/setup.h
index 705d1d6ff7..df8c93687a 100644
--- a/setup.h
+++ b/setup.h
@@ -38,7 +38,7 @@ int is_nonbare_repository_dir(struct strbuf *path);
 #define READ_GITFILE_ERR_TOO_LARGE 8
 #define READ_GITFILE_ERR_MISSING 9
 #define READ_GITFILE_ERR_IS_A_DIR 10
-void read_gitfile_error_die(int error_code, const char *path, const char *dir);
+void read_gitfile_error_die(int error_code, const char *path);
 const char *read_gitfile_gently(const char *path, int *return_error_code);
 #define read_gitfile(path) read_gitfile_gently((path), NULL)
 const char *resolve_gitdir_gently(const char *suspect, int *return_error_code);
diff --git a/submodule.c b/submodule.c
index fd91201a92..93d0361072 100644
--- a/submodule.c
+++ b/submodule.c
@@ -2579,7 +2579,7 @@ void absorb_git_dir_into_superproject(const char *path,
 
 		if (err_code != READ_GITFILE_ERR_NOT_A_REPO)
 			/* We don't know what broke here. */
-			read_gitfile_error_die(err_code, path, NULL);
+			read_gitfile_error_die(err_code, path);
 
 		/*
 		* Maybe populated, but no git directory was found?
diff --git a/t/t0002-gitfile.sh b/t/t0002-gitfile.sh
index dfbcdddbcc..6356e9ec72 100755
--- a/t/t0002-gitfile.sh
+++ b/t/t0002-gitfile.sh
@@ -27,7 +27,7 @@ test_expect_success 'bad setup: invalid .git file format' '
 test_expect_success 'bad setup: invalid .git file path' '
 	echo "gitdir: $REAL.not" >.git &&
 	test_must_fail git rev-parse 2>.err &&
-	test_grep "not a git repository" .err
+	test_grep "gitfile does not point to a valid repository" .err
 '
 
 test_expect_success 'final setup + check rev-parse --git-dir' '
-- 
2.55.0.rc0.342.g45e27e83e2


^ permalink raw reply related

* Re: [PATCH 3/4] builtin/refs: add "update" subcommand
From: Junio C Hamano @ 2026-06-16 11:17 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git
In-Reply-To: <20260616-pks-refs-writing-subcommands-v1-3-9f5219b6109d@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> Add a new "update" subcommand which mirrors `git update-ref <refname>
> <oldoid> <newoid>`. This follows the same reasoning as the preceding
> commit.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  Documentation/git-refs.adoc |   7 ++
>  builtin/refs.c              |  50 +++++++++++++
>  t/meson.build               |   1 +
>  t/t1465-refs-update.sh      | 179 ++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 237 insertions(+)

I do not offhand know (and I am still away by 2 hours from the time
I wake up and start functioning) if update-ref shares the same
issue, but with "delete, update, rename" combo, lack of "create"
feels a bit annoying.  Wouldn't we want to offer an option to users
who want to ensure that the refs they create are truly new and they
are not overwriting a ref somebody has created?  Either (1) drop
"delete" and take a special value (e.g. "") as <newvalue> to signal
deletion and make the same special value used as <oldvalue> signals
creation, or (2) add "create" and insist that "update" takes only an
existing ref, would make the annoyance go away, I guess.

> +test_expect_success 'update creates a new reference' '
> +	test_when_finished "rm -rf repo" &&
> +	setup_repo repo &&
> +	(
> +		cd repo &&
> +		A=$(git rev-parse A) &&
> +		git refs update refs/heads/foo $A &&
> +		test_ref_matches refs/heads/foo "$A"
> +	)
> +'

Here we cannot test (and I strongly suspect that "git refs update"
and "git update-ref" lack ability to do so) a case where a creation
is attempted on an existing ref and fails.

^ permalink raw reply

* Re: [PATCH] cat-file: speed up default format
From: Jeff King @ 2026-06-16 11:15 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List
In-Reply-To: <df933ffa-1be2-4401-a4ac-9d72c9c4cdcc@web.de>

On Mon, Jun 15, 2026 at 11:53:10PM +0200, René Scharfe wrote:

> >  	} else if (is_atom("rest", atom, len)) {
> > -		if (data->mark_query)
> > -			data->split_on_whitespace = 1;
> > -		else if (data->rest)
> 
> This removes support for rest being NULL, breaking t1006.381.

Yup. I did say "only lightly tested". ;)

The fix is obviously just:

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 9cc7ec7a6f..370ca6d771 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -363,7 +363,8 @@ static void objectsize_disk_add(struct format_item *item UNUSED,
 static void rest_add(struct format_item *item UNUSED,
 		     struct strbuf *sb, struct expand_data *data)
 {
-	strbuf_addstr(sb, data->rest);
+	if (data->rest)
+		strbuf_addstr(sb, data->rest);
 }
 
 static void deltabase_add(struct format_item *item UNUSED,

I think perhaps this error shows that the mark_query thing in the
existing code obfuscates the logic a bit.

-Peff

^ permalink raw reply related

* Re: [PATCH] cat-file: speed up default format
From: Jeff King @ 2026-06-16 11:12 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List
In-Reply-To: <10a33614-837f-4166-aa30-6de28b052692@web.de>

On Mon, Jun 15, 2026 at 11:53:07PM +0200, René Scharfe wrote:

> > IMHO that is probably not worth it for a custom parsing system just for
> > cat-file.  But if we were to finally unify ref-filter and cat-file (and
> > even --pretty=format) then it would probably worth doing this kind of
> > pre-parsing.
> It could be worth it for cat-file alone if we find the right balance, as
> it already does do a separate parsing step, but that is awkward with its
> mark_query checks all over the place and remembers only object property
> requirements and no other format string details.

Yeah, getting rid of the mark_query pass was a nice side effect of
having a true parse step.

> Making the opcodes small should be beneficial.  We need only a handful
> of them, so a byte each should suffice.  We can use a strbuf for that.
> 
> We can also store literal characters in there.  An opcode plus with a
> payload char incurs an overhead of 50%, which sounds high, but at least
> the default format only has two of them and it's much better than
> storing pointer plus size for an overhead of more than 90% in case of a
> single char.

True, and it's a size win if the literal portions tend to be small
(fewer than 15 bytes). You do lose out on the ability to strbuf_add()
them in one go, though. So lots more strbuf_grow() checks, etc. If you
really wanted to get fancy, you could follow the opcode with a length
represented as a variable-sized integer, followed by the literal bytes.

I'm not sure that Git's formatting code needs to squeeze out quite that
much performance, though.

> That gets us closer to native speed, at least on an Apple M1:
> 
> Benchmark 1: ./git_fp cat-file --batch-all-objects --batch-check='%(objectname)-%(objecttype)-%(objectsize)'
>   Time (mean ± σ):     992.7 ms ±   3.2 ms    [User: 967.5 ms, System: 23.8 ms]
>   Range (min … max):   990.1 ms … 1000.7 ms    10 runs
> 
> Benchmark 2: ./git_switch cat-file --batch-all-objects --batch-check='%(objectname)-%(objecttype)-%(objectsize)'
>   Time (mean ± σ):     991.8 ms ±   1.6 ms    [User: 967.0 ms, System: 23.3 ms]
>   Range (min … max):   989.3 ms … 994.4 ms    10 runs
> 
> Benchmark 3: ./git cat-file --batch-all-objects --batch-check='%(objectname)-%(objecttype)-%(objectsize)'
>   Time (mean ± σ):     985.8 ms ±   2.9 ms    [User: 960.5 ms, System: 23.6 ms]
>   Range (min … max):   982.9 ms … 993.0 ms    10 runs
> 
> Benchmark 4: ./git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype) %(objectsize)'
>   Time (mean ± σ):     982.1 ms ±   3.2 ms    [User: 956.7 ms, System: 23.6 ms]
>   Range (min … max):   979.2 ms … 989.2 ms    10 runs

OK, so we managed another 1%. But I'm skeptical that this linear opcode
technique is where we want to go in the long run, if we're ever going to
unify formatters.

One, for more advanced features like %(if) we'd want to support some
notion of hierarchy and recursion. We have to speculatively format the
inner part and see if it is empty.

Though I guess that is possible with a linearized set of opcodes. If you
turn "%(if)%(HEAD)%(then)*%(end)" into:

  FMT_IF
  FMT_HEAD
  FMT_THEN
  FMT_LITERAL
  *
  FMT_END

then I guess you just start a sub-execution of the opcodes after FMT_IF
and tell it to stop when it sees FMT_THEN. It does mean that the opcodes
themselves need to control the program counter, rather than the executor
blindly walking along the opcodes and asking them to put stuff in the
output. Whereas I think if the parser builds a tree of structs then this
falls out pretty naturally.

The second thing is that many of the ref-filter atoms have options, and
those options have to be represented in the opcodes. That works
naturally if each opcode gets its own struct (either with a big union,
or true polymorphism). But representing "%(describe:match=versions/v*)"
in opcodes sounds gross. Now you need opcodes to represent the options
(and maybe "no more options"), and some way of encoding arbitrary input
for those option arguments.

-Peff

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox