Git development
 help / color / mirror / Atom feed
* Re: [GSoC Patch v5 2/4] rev-parse: use append_formatted_path() for path formatting
From: Phillip Wood @ 2026-06-16 18:26 UTC (permalink / raw)
  To: K Jayatheerth
  Cc: git, jltobler, lucasseikioshiro, gitster, phillip.wood, sandals,
	kumarayushjha123, a3205153416
In-Reply-To: <CA+rGoLfhhRNrSReeJ1grhy+2K3BSrikTCNgGpCaGqc4fFp3Lfg@mail.gmail.com>



On 16/06/2026 18:04, K Jayatheerth wrote:
> Hi Phillip,
> Thanks for taking a look!
> 
>> On 16/06/2026 05:49, K Jayatheerth wrote:
>>> The path-formatting logic in builtin/rev-parse.c is tightly coupled
>>> to that command and writes directly to stdout, making it impossible
>>> for other builtins to reuse.
>>>
>>> Extract the core algorithm into append_formatted_path() in path.c
>>> and expose a path_format enum in path.h so that any builtin can
>>> format paths consistently without duplicating logic.
>>
>> Sorry I haven't had time to look at this series recently, it is looking
>> much nicer now that we have a single enum. It would be helpful to
>> explain why we need PATH_FORMAT_DEFAULT that acts exactly like
>> PATH_FORMAT_UNMODIFIED. Looking at the next patch it seems this is still
>> a wart in the api due to rev-parse wanting needing to distinguish the
>> unmodified case from the default case.
> t);
>>> +
>>>    # ifdef USE_THE_REPOSITORY_VARIABLE
>>>    #  include "strbuf.h"
>>>    #  include "repository.h"
>>
> 
> 
>>>    int cmd_rev_parse(int argc,
>>> @@ -717,7 +661,7 @@ int cmd_rev_parse(int argc,
>>>        const char *name = NULL;
>>>        struct strbuf buf = STRBUF_INIT;
>>>        int seen_end_of_options = 0;
>>> -     enum format_type format = FORMAT_DEFAULT;
>>> +     enum path_format arg_path_format = PATH_FORMAT_DEFAULT;
>>
>> This is the source of the api wart I referred to in the previous patch.
>> Could we keep the existing enums and convert them into the appropriate
>> PATH_FORMAT_* flag in print_path() above? I think we already have the
>> logic to do that in the existing code. That would mean that other users
>> of append_formatted_path() don't have to worry about the extra flag.
>>
> 
> That is a much more elegant solution than the current one.
> 
> For v6, I will clean this up by keeping the fallback logic
> localized within builtin/rev-parse.c and removing
> PATH_FORMAT_DEFAULT entirely from enum path_format in path.h.
> 
> Instead, I'll re-introduce a small local enum (e.g., enum
> rev_parse_format) inside rev-parse.c to handle the
> command-line parsing state (tracking whether the user
> explicitly provided a flag or if we are still in a
> neutral/default state).

I think it is probably simplest to keep the existing enums and modify 
print_path() to convert them to the appropriate PATH_FORMAT_*. That way 
we can keep the option parsing code as is.

Thanks

Phillip

> As you said, most of the logic is already present. In
> print_path(), we will check that local tracking enum. If it’s
> set to the local default, we can map it directly to the
> path-specific def_format before invoking append_formatted_path().
> This ensures other users of the function don't have to worry
> about the extra flag.
> 
> I will send out the v6 series with these fixes shortly.
> 
> Regards,
> - K Jayatheerth


^ permalink raw reply

* [PATCH] sequencer: Skip copying notes for commits that disappear during rebase
From: Uwe Kleine-König @ 2026-06-16 17:40 UTC (permalink / raw)
  To: git; +Cc: Phillip Wood

When a commit disappears during rebase because the patch content is
already there (but not by the same patch in which case the commit would
be skipped) the notes of that disappearing commit should not be copied
to the unrelated commit that happens to be HEAD.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
---
Hello,

after also my 2nd bug report[1] didn't motivate anyone to come up with a
fix, I invested the time to work out one according to Phillip Wood's
suggestion.

IMHO it's not pretty, but it works for me.

Note that Phillip also suggested to integrete the test into
t3400-rebase.sh . IMHO it doesn't matter much if this is considered a
rebase test or a notes test. I kept it where I have it because I'm lazy
and failed to understand the git history created in that test.

Best regards
Uwe

[1] https://lore.kernel.org/git/20260612143952.3281115-2-u.kleine-koenig@baylibre.com


 sequencer.c             | 20 ++++++++++----------
 t/meson.build           |  1 +
 t/t3322-notes-rebase.sh | 37 +++++++++++++++++++++++++++++++++++++
 3 files changed, 48 insertions(+), 10 deletions(-)
 create mode 100755 t/t3322-notes-rebase.sh

diff --git a/sequencer.c b/sequencer.c
index 57855b0066ac..da2185a37c5d 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -2263,7 +2263,7 @@ static const char *reflog_message(struct replay_opts *opts,
 static int do_pick_commit(struct repository *r,
 			  struct todo_item *item,
 			  struct replay_opts *opts,
-			  int final_fixup, int *check_todo)
+			  int final_fixup, int *check_todo, int *dropped_commit)
 {
 	struct replay_ctx *ctx = opts->ctx;
 	unsigned int flags = should_edit(opts) ? EDIT_MSG : 0;
@@ -2273,7 +2273,7 @@ static int do_pick_commit(struct repository *r,
 	const char *base_label, *next_label, *reflog_action;
 	char *author = NULL;
 	struct commit_message msg = { NULL, NULL, NULL, NULL };
-	int res, unborn = 0, reword = 0, allow, drop_commit;
+	int res, unborn = 0, reword = 0, allow;
 	enum todo_command command = item->command;
 	struct commit *commit = item->commit;
 
@@ -2492,7 +2492,7 @@ static int do_pick_commit(struct repository *r,
 		goto leave;
 	}
 
-	drop_commit = 0;
+	*dropped_commit = 0;
 	allow = allow_empty(r, opts, commit);
 	if (allow < 0) {
 		res = allow;
@@ -2500,7 +2500,7 @@ static int do_pick_commit(struct repository *r,
 	} else if (allow == 1) {
 		flags |= ALLOW_EMPTY;
 	} else if (allow == 2) {
-		drop_commit = 1;
+		*dropped_commit = 1;
 		refs_delete_ref(get_main_ref_store(r), "", "CHERRY_PICK_HEAD",
 				NULL, REF_NO_DEREF);
 		unlink(git_path_merge_msg(r));
@@ -2510,7 +2510,7 @@ static int do_pick_commit(struct repository *r,
 			_("dropping %s %s -- patch contents already upstream\n"),
 			oid_to_hex(&commit->object.oid), msg.subject);
 	} /* else allow == 0 and there's nothing special to do */
-	if (!opts->no_commit && !drop_commit) {
+	if (!opts->no_commit && !*dropped_commit) {
 		if (author || command == TODO_REVERT || (flags & AMEND_MSG))
 			res = do_commit(r, msg_file, author, reflog_action,
 					opts, flags,
@@ -4943,12 +4943,12 @@ static int pick_one_commit(struct repository *r,
 			   struct replay_opts *opts,
 			   int *check_todo, int* reschedule)
 {
-	int res;
+	int res, dropped_commit;
 	struct todo_item *item = todo_list->items + todo_list->current;
 	const char *arg = todo_item_get_arg(todo_list, item);
 
 	res = do_pick_commit(r, item, opts, is_final_fixup(todo_list),
-			     check_todo);
+			     check_todo, &dropped_commit);
 	if (is_rebase_i(opts) && res < 0) {
 		/* Reschedule */
 		*reschedule = 1;
@@ -4965,7 +4965,7 @@ static int pick_one_commit(struct repository *r,
 		return error_with_patch(r, commit,
 					arg, item->arg_len, opts, res, !res);
 	}
-	if (is_rebase_i(opts) && !res)
+	if (is_rebase_i(opts) && !res && !dropped_commit)
 		record_in_rewritten(&item->commit->object.oid,
 				    peek_command(todo_list, 1));
 	if (res && is_fixup(item->command)) {
@@ -5523,14 +5523,14 @@ static int single_pick(struct repository *r,
 		       struct commit *cmit,
 		       struct replay_opts *opts)
 {
-	int check_todo;
+	int check_todo, dummy;
 	struct todo_item item;
 
 	item.command = opts->action == REPLAY_PICK ?
 			TODO_PICK : TODO_REVERT;
 	item.commit = cmit;
 
-	return do_pick_commit(r, &item, opts, 0, &check_todo);
+	return do_pick_commit(r, &item, opts, 0, &check_todo, &dummy);
 }
 
 int sequencer_pick_revisions(struct repository *r,
diff --git a/t/meson.build b/t/meson.build
index c5832fee0535..6927bd9c794f 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -358,6 +358,7 @@ integration_tests = [
   't3311-notes-merge-fanout.sh',
   't3320-notes-merge-worktrees.sh',
   't3321-notes-stripspace.sh',
+  't3322-notes-rebase.sh',
   't3400-rebase.sh',
   't3401-rebase-and-am-rename.sh',
   't3402-rebase-merge.sh',
diff --git a/t/t3322-notes-rebase.sh b/t/t3322-notes-rebase.sh
new file mode 100755
index 000000000000..0eddde7f9961
--- /dev/null
+++ b/t/t3322-notes-rebase.sh
@@ -0,0 +1,37 @@
+#!/bin/sh
+
+test_description='Test notes on rebase'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+	git init &&
+	git config notes.rewriteRef refs/notes/commits &&
+	git version > version &&
+	echo A > A &&
+	git add A &&
+	git commit -m A &&
+	git branch branch &&
+	echo B > B &&
+	git add B &&
+	git commit -m B &&
+	git notes add -m "This is B" @ &&
+	echo C > C &&
+	git add C &&
+	git commit -m C &&
+	git checkout branch &&
+	echo B > B &&
+	echo D > D &&
+	git add B D &&
+	git commit -m BD
+'
+
+test_expect_success 'rebase B + C on top of BD' '
+	git rebase @ master
+'
+
+test_expect_success 'assert there is no note on BD' '
+	if git notes list branch >/tmp/lalaa; then return 1; fi
+'
+
+test_done

base-commit: 3e65291872de10c3f0bf05ea8c24187e7a71ebf0
-- 
2.47.3


^ permalink raw reply related

* Re: [PATCH] rebase: mention --abort alongside --continue
From: Junio C Hamano @ 2026-06-16 17:33 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Harald Nordgren via GitGitGadget, git, Harald Nordgren
In-Reply-To: <89d72342-5aa1-4dcf-951b-d0c791f91738@gmail.com>

Phillip Wood <phillip.wood123@gmail.com> writes:

> Hi Harald
>
> On 15/06/2026 20:19, Harald Nordgren via GitGitGadget wrote:
>> From: Harald Nordgren <haraldnordgren@gmail.com>
>> 
>> The warning shown when an "exec" step fails and the "git status"
>> advice while splitting or editing a commit pointed users at "git
>> rebase --continue" but not "--abort". Mention it in both, matching
>> the conflict case.
>
> I'm not sure that the "failed exec" and "conflicts" cases are equivalent 
> though. If you have some nasty conflict that you don't want to resolve 
> then aborting and trying another approach such is incrementally rebasing 
> is the only option. If an exec command fails then it likely means that a 
> test has failed or some something similar which is minor inconvenience 
> which needs fixing before continuing - it seems very unlikely that the 
> user would want to abort the rebase.

It is very true that users who know what they are doing and got into
such conflicts are opted to go into such a situation tnat it is
unlikely that they would appreciate a choice to abort.

But given that for any system, everybody starts as a newbie, it may
be assuring to always give "here is a way out" option when they get
in a nasty confusing situation.  Discouraging the way to use the
tool that can lead to confusing situation by guiding them with BCP
workflows would help, but they always get into pitfall.

The patch adds new message into the existing message to suggest how
to move forward, but as a training wheel option, it may not be a bad
thing to offer "--abort" as an extra hint, separate from the
existing warning() message.


^ permalink raw reply

* Re: [GSoC Patch v5 2/4] rev-parse: use append_formatted_path() for path formatting
From: K Jayatheerth @ 2026-06-16 17:04 UTC (permalink / raw)
  To: Phillip Wood
  Cc: git, jltobler, lucasseikioshiro, gitster, phillip.wood, sandals,
	kumarayushjha123, a3205153416
In-Reply-To: <0077b1ae-3c85-4b34-a0ac-766395157c4f@gmail.com>

Hi Phillip,
Thanks for taking a look!

> On 16/06/2026 05:49, K Jayatheerth wrote:
> > The path-formatting logic in builtin/rev-parse.c is tightly coupled
> > to that command and writes directly to stdout, making it impossible
> > for other builtins to reuse.
> >
> > Extract the core algorithm into append_formatted_path() in path.c
> > and expose a path_format enum in path.h so that any builtin can
> > format paths consistently without duplicating logic.
>
> Sorry I haven't had time to look at this series recently, it is looking
> much nicer now that we have a single enum. It would be helpful to
> explain why we need PATH_FORMAT_DEFAULT that acts exactly like
> PATH_FORMAT_UNMODIFIED. Looking at the next patch it seems this is still
> a wart in the api due to rev-parse wanting needing to distinguish the
> unmodified case from the default case.
t);
> > +
> >   # ifdef USE_THE_REPOSITORY_VARIABLE
> >   #  include "strbuf.h"
> >   #  include "repository.h"
>


> >   int cmd_rev_parse(int argc,
> > @@ -717,7 +661,7 @@ int cmd_rev_parse(int argc,
> >       const char *name = NULL;
> >       struct strbuf buf = STRBUF_INIT;
> >       int seen_end_of_options = 0;
> > -     enum format_type format = FORMAT_DEFAULT;
> > +     enum path_format arg_path_format = PATH_FORMAT_DEFAULT;
>
> This is the source of the api wart I referred to in the previous patch.
> Could we keep the existing enums and convert them into the appropriate
> PATH_FORMAT_* flag in print_path() above? I think we already have the
> logic to do that in the existing code. That would mean that other users
> of append_formatted_path() don't have to worry about the extra flag.
>

That is a much more elegant solution than the current one.

For v6, I will clean this up by keeping the fallback logic
localized within builtin/rev-parse.c and removing
PATH_FORMAT_DEFAULT entirely from enum path_format in path.h.

Instead, I'll re-introduce a small local enum (e.g., enum
rev_parse_format) inside rev-parse.c to handle the
command-line parsing state (tracking whether the user
explicitly provided a flag or if we are still in a
neutral/default state).

As you said, most of the logic is already present. In
print_path(), we will check that local tracking enum. If it’s
set to the local default, we can map it directly to the
path-specific def_format before invoking append_formatted_path().
This ensures other users of the function don't have to worry
about the extra flag.

I will send out the v6 series with these fixes shortly.

Regards,
- K Jayatheerth

^ permalink raw reply

* Re: [PATCH v5 2/2] graph: indent visual root in graph
From: Junio C Hamano @ 2026-06-16 16:36 UTC (permalink / raw)
  To: Pablo Sabater
  Cc: Taylor Blau, git, ayu.chandekar, chandrapratap3519,
	christian.couder, jltobler, karthik.188, peff, phillip.wood,
	siddharthasthana31
In-Reply-To: <CAN5EUNR-o_sLzeWuy7M9UMFHBKxSuytNd=4p2svtFuv40E8vZg@mail.gmail.com>

Pablo Sabater <pabloosabaterr@gmail.com> writes:

> Back to the test:
>
>   * 41_octopus
>   | * 43_B
>   |  \
>   |   * 43_A
>   | * 42_B
>   | * 42_A
>   * 41_B
>   * 41_A
>
> 43_A is rendered on the second column (first column is active by the
> 41_* branch) and gets indented to the third one. With commit-graph it
> would be on the first and get indented to the second, making it the
> same as more general tests above in "t4218", it is an edge case but
> shows that indentation works correctly independently where the visual
> root is.

Sounds good.

^ permalink raw reply

* Re: [PATCH v2 0/6] Support hashing objects larger than 4GB on Windows
From: Junio C Hamano @ 2026-06-16 16:09 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Philip Oakley, Patrick Steinhardt, Johannes Schindelin
In-Reply-To: <pull.2138.v2.git.1781621398.gitgitgadget@gmail.com>

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> Changes since v1:
>
>  * Rebased to current master to resolve the conflicts with
>    ps/odb-source-loose

Very much appreciated.

>  * Dropped the !LONG_IS_64BIT prereq from the added/touched tests, as it is
>    now no longer needed

Good thing to do and see that the code works as well as it could,
whether a long is 32-bit or 64-bit ;-).

> Philip Oakley (6):
>   hash-object: demonstrate a >4GB/LLP64 problem
>   object-file.c: use size_t for header lengths
>   hash algorithms: use size_t for section lengths
>   hash-object --stdin: verify that it works with >4GB/LLP64
>   hash-object: add another >4GB/LLP64 test case
>   hash-object: add a >4GB/LLP64 test case using filtered input
>
>  object-file.c          | 14 +++++++-------
>  object-file.h          |  6 +++---
>  odb/source-files.c     |  2 +-
>  odb/source-inmemory.c  |  2 +-
>  odb/source-loose.c     |  4 ++--
>  odb/source.h           |  2 +-
>  sha1dc_git.c           |  3 +--
>  sha1dc_git.h           |  2 +-
>  t/t1007-hash-object.sh | 39 +++++++++++++++++++++++++++++++++++++++
>  9 files changed, 56 insertions(+), 18 deletions(-)

Will queue.  Thanks.

>
>
> base-commit: 700432b2ba22603a0bcb71475c9c333d17c9b0d1
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2138%2Fdscho%2FPhilipOakley%2Fhashliteral_t-v2
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2138/dscho/PhilipOakley/hashliteral_t-v2
> Pull-Request: https://github.com/gitgitgadget/git/pull/2138
>
> Range-diff vs v1:
>
>  1:  84e1cd0aa0 = 1:  9c01bac407 hash-object: demonstrate a >4GB/LLP64 problem
>  2:  809d83e46f ! 2:  aa5859c14f object-file.c: use size_t for header lengths
>      @@ Commit message

By the way, how is range-diff driven via GGG?  After applying these
patches on the same base commit, my "git range-diff v1...v2" invocation
punts on matching step 2 and I do not get a comparison like this
unless I give --creation-factor=<large number> from the command line.



^ permalink raw reply

* Re: [PATCH v3] read_gitfile(): simplify NOT_A_REPO error message
From: Junio C Hamano @ 2026-06-16 15:48 UTC (permalink / raw)
  To: Jeff King; +Cc: Patrick Steinhardt, git, Tian Yuchen
In-Reply-To: <20260616144554.GA2305974@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Tue, Jun 16, 2026 at 07:25:02AM -0700, Junio C Hamano wrote:
>
>> >     +@@ t/t7450-bad-git-dotfiles.sh: test_expect_success 'git dirs of sibling submodules must not be nested' '
>> >     + test_expect_success 'submodule git dir nesting detection must work with parallel cloning' '
>> >     + 	test_must_fail git clone --recurse-submodules --jobs=2 nested clone_parallel 2>err &&
>> >     + 	cat err &&
>> >     +-	grep -E "(already exists|is inside git dir|not a git repository)" err &&
>> >     ++	grep -E "(already exists|is inside git dir|does not point to a valid repository)" err &&
>> 
>> A few things.
>> 
>>  * Will we be happy to see only one of these possibilities, or do we
>>    expect to see these once for each kind?
>
> I imagine it is only one. This all comes from 9cf8547320 (clone: prevent
> clashing git dirs when cloning submodule in parallel, 2024-01-28), and
> it is expecting the nested path to cause a failure. Which failure I
> guess depends on the racy ordering. If we create the inner one first,
> then we probably get "already exists", and if the outer one, then "is
> inside git dir". I don't know exactly what sequence yields the
> NOT_A_REPO message.
>
> But none of that is changing in this patch, just what the user-visible
> text is for the NOT_A_REPO case.
>
> I did briefly wonder if we might see "not a git repository" from a
> _different_ code path, and need to catch it along with the new message.
> But running successfully with --stress implies that we never see the old
> one anymore.

I see.  Thanks.

>
>>  * a recently started in-flight topic tries to catch bare "grep" and
>>    fails until you write test_grep X-<.
>
> Yeah. This will create a merge conflict for you, but hopefully the
> resolution should be obvious. I don't think it makes sense to fix here,
> as it's orthogonal to the purpose of the patch.

Yup.  I agree that, given that others in the same script will be
updated by that other topic to conflict with this change, it would
not make sense to do the same changes here.

Thanks.

^ permalink raw reply

* Re: [PATCH 4/4] builtin/refs: add "rename" subcommand
From: Junio C Hamano @ 2026-06-16 14:53 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git
In-Reply-To: <20260616-pks-refs-writing-subcommands-v1-4-9f5219b6109d@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> +static int cmd_refs_rename(int argc, const char **argv, const char *prefix,
> +			   struct repository *repo)
> +{
> +	static char const * const refs_rename_usage[] = {
> +		REFS_RENAME_USAGE,
> +		NULL
> +	};
> +	const char *message = NULL;
> +	struct option opts[] = {
> +		OPT_STRING(0, "message", &message, N_("reason"),
> +			   N_("reason of the update")),
> +		OPT_END(),
> +	};
> +	const char *oldref, *newref;
> +
> +	argc = parse_options(argc, argv, prefix, opts, refs_rename_usage, 0);
> +	if (argc != 2)
> +		usage(_("rename requires old and new reference name"));
> +	if (message && !*message)
> +		die(_("refusing to perform update with empty message"));
> +
> +	oldref = argv[0];
> +	newref = argv[1];
> +
> +	if (check_refname_format(oldref, 0))
> +		die(_("invalid ref format: %s"), oldref);
> +	if (check_refname_format(newref, 0))
> +		die(_("invalid ref format: %s"), newref);

Do we want to quote the value?  What other subcommands do in "git refs"?

> +	if (!refs_ref_exists(get_main_ref_store(repo), oldref))
> +		die(_("reference does not exist: '%s'"), oldref);
> +	if (refs_ref_exists(get_main_ref_store(repo), newref))
> +		die(_("reference already exists: '%s'"), newref);
> +
> +	return refs_rename_ref(get_main_ref_store(repo), oldref, newref, message);
> +}

I suspect that my version shared the same issue, but doesn't
refs_rename_ref() return -1 for failure, which we may want to turn
to positive 1 before returning?

This is a tangent but git.c:handle_builtin() that calls
git.c:run_builtin() may want to do the "negative return? flip the
polarity" conversion to make this worry go away.  I dunno what such
a change would break, though.

If we rename a ref that does not have a reflog, would it leave the
ref under the new name without reflog, or would we get a reflog with
a single entry that marks the fact the old ref was renamed into the
new ref?  Should that be controlled via --create-reflog option?

^ permalink raw reply

* Re: [PATCH 3/4] builtin/refs: add "update" subcommand
From: Junio C Hamano @ 2026-06-16 14:52 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git
In-Reply-To: <20260616-pks-refs-writing-subcommands-v1-3-9f5219b6109d@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

>  git refs delete [--message=<reason>] [--no-deref] <ref> [<oldvalue>]
> +git refs update [--message=<reason>] [--no-deref] [--create-reflog] <ref> <new-value> [<old-value>]

"<old-value> vs <new-value>" is good, we should update "delete" to
use "<old-value>" to match.


>  DESCRIPTION
>  -----------
> @@ -58,6 +59,12 @@ delete::
>  	reference is only deleted after verifying that it currently contains
>  	`<oldvalue>`.
>  
> +update::
> +	Update the given reference to point at `<new-value>`. This subcommand
> +	mirrors `git update-ref` (see linkgit:git-update-ref[1]). When
> +	`<old-value>` is given, the reference is only updated after verifying
> +	that it currently contains `<old-value>`.

As to the lack of "create", among the two potential changes, I have
a slight preference for adding "create" and failing "update" that
does not refer to an existing ref.  If we go that route, the
"--create-reflog" option should move to "create", as "update" will
never be used to create a new ref.


> +	if (repo_get_oid_with_flags(repo, argv[1], &newoid,
> +				    GET_OID_SKIP_AMBIGUITY_CHECK))
> +		die(_("invalid new object ID: %s"), argv[1]);
> +	if (argc == 3 &&
> +	    repo_get_oid_with_flags(repo, argv[2], &oldoid,
> +				    GET_OID_SKIP_AMBIGUITY_CHECK))
> +		die(_("invalid old object ID: %s"), argv[2]);

On the "delete" side, these messages quote the object name, i.e.,

			die(_("invalid old object ID: '%s'"), argv[1]);

We should be consistent.

^ permalink raw reply

* [PATCH v2 6/6] hash-object: add a >4GB/LLP64 test case using filtered input
From: Philip Oakley via GitGitGadget @ 2026-06-16 14:49 UTC (permalink / raw)
  To: git; +Cc: Philip Oakley, Patrick Steinhardt, Johannes Schindelin,
	Philip Oakley
In-Reply-To: <pull.2138.v2.git.1781621398.gitgitgadget@gmail.com>

From: Philip Oakley <philipoakley@iee.email>

To verify that the `clean` side of the `clean`/`smudge` filter code is
correct with regards to LLP64 (read: to ensure that `size_t` is used
instead of `unsigned long`), here is a test case using a trivial filter,
specifically _not_ writing anything to the object store to limit the
scope of the test case.

As in previous commits, the `big` file from previous test cases is
reused if available, to save setup time, otherwise re-generated.

Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t1007-hash-object.sh | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index f96c29ce68..4bc82dd968 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -285,4 +285,16 @@ test_expect_success EXPENSIVE,SIZE_T_IS_64BIT \
 	test_cmp expect actual
 '
 
+# This clean filter does nothing, other than excercising the interface.
+# We ensure that cleaning doesn't mangle large files on 64-bit Windows.
+test_expect_success EXPENSIVE,SIZE_T_IS_64BIT \
+		'hash filtered files over 4GB correctly' '
+	{ test -f big || test-tool genzeros $((5*1024*1024*1024)) >big; } &&
+	test_oid large5GB >expect &&
+	test_config filter.null-filter.clean "cat" &&
+	echo "big filter=null-filter" >.gitattributes &&
+	git hash-object -- big >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related

* [PATCH v2 5/6] hash-object: add another >4GB/LLP64 test case
From: Philip Oakley via GitGitGadget @ 2026-06-16 14:49 UTC (permalink / raw)
  To: git; +Cc: Philip Oakley, Patrick Steinhardt, Johannes Schindelin,
	Philip Oakley
In-Reply-To: <pull.2138.v2.git.1781621398.gitgitgadget@gmail.com>

From: Philip Oakley <philipoakley@iee.email>

To complement the `--stdin` and `--literally` test cases that verify
that we can hash files larger than 4GB on 64-bit platforms using the
LLP64 data model, here is a test case that exercises `hash-object`
_without_ any options.

Just as before, we use the `big` file from the previous test case if it
exists to save on setup time, otherwise generate it.

Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t1007-hash-object.sh | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index bcae3fc54c..f96c29ce68 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -277,4 +277,12 @@ test_expect_success EXPENSIVE,SIZE_T_IS_64BIT \
 	test_cmp expect actual
 '
 
+test_expect_success EXPENSIVE,SIZE_T_IS_64BIT \
+		'files over 4GB hash correctly' '
+	{ test -f big || test-tool genzeros $((5*1024*1024*1024)) >big; } &&
+	test_oid large5GB >expect &&
+	git hash-object -- big >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v2 4/6] hash-object --stdin: verify that it works with >4GB/LLP64
From: Philip Oakley via GitGitGadget @ 2026-06-16 14:49 UTC (permalink / raw)
  To: git; +Cc: Philip Oakley, Patrick Steinhardt, Johannes Schindelin,
	Philip Oakley
In-Reply-To: <pull.2138.v2.git.1781621398.gitgitgadget@gmail.com>

From: Philip Oakley <philipoakley@iee.email>

Just like the `hash-object --literally` code path, the `--stdin` code
path also needs to use `size_t` instead of `unsigned long` to represent
memory sizes, otherwise it would cause problems on platforms using the
LLP64 data model (such as Windows).

To limit the scope of the test case, the object is explicitly not
written to the object store, nor are any filters applied.

The `big` file from the previous test case is reused to save setup time;
To avoid relying on that side effect, it is generated if it does not
exist (e.g. when running via `sh t1007-*.sh --long --run=1,41`).

Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t1007-hash-object.sh | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index f028a1cbcc..bcae3fc54c 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -269,4 +269,12 @@ test_expect_success EXPENSIVE,SIZE_T_IS_64BIT \
 	test_cmp expect actual
 '
 
+test_expect_success EXPENSIVE,SIZE_T_IS_64BIT \
+		'files over 4GB hash correctly via --stdin' '
+	{ test -f big || test-tool genzeros $((5*1024*1024*1024)) >big; } &&
+	test_oid large5GB >expect &&
+	git hash-object --stdin <big >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v2 3/6] hash algorithms: use size_t for section lengths
From: Philip Oakley via GitGitGadget @ 2026-06-16 14:49 UTC (permalink / raw)
  To: git; +Cc: Philip Oakley, Patrick Steinhardt, Johannes Schindelin,
	Philip Oakley
In-Reply-To: <pull.2138.v2.git.1781621398.gitgitgadget@gmail.com>

From: Philip Oakley <philipoakley@iee.email>

Continue walking the code path for the >4GB `hash-object --literally`
test to the hash algorithm step for LLP64 systems.

This patch lets the SHA1DC code use `size_t`, making it compatible with
LLP64 data models (as used e.g. by Windows).

The interested reader of this patch will note that we adjust the
signature of the `git_SHA1DCUpdate()` function without updating _any_
call site. This certainly puzzled at least one reviewer already, so here
is an explanation:

This function is never called directly, but always via the macro
`platform_SHA1_Update`, which is usually called via the macro
`git_SHA1_Update`. However, we never call `git_SHA1_Update()` directly
in `struct git_hash_algo`. Instead, we call `git_hash_sha1_update()`,
which is defined thusly:

    static void git_hash_sha1_update(git_hash_ctx *ctx,
                                     const void *data, size_t len)
    {
        git_SHA1_Update(&ctx->sha1, data, len);
    }

i.e. it contains an implicit downcast from `size_t` to `unsigned long`
(before this here patch). With this patch, there is no downcast anymore.

With this patch, finally, the t1007-hash-object.sh "files over 4GB hash
literally" test case is fixed.

Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 object-file.c          | 4 ++--
 sha1dc_git.c           | 3 +--
 sha1dc_git.h           | 2 +-
 t/t1007-hash-object.sh | 2 +-
 4 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/object-file.c b/object-file.c
index dccbe0fc3e..0056c369ce 100644
--- a/object-file.c
+++ b/object-file.c
@@ -316,7 +316,7 @@ int parse_loose_header(const char *hdr, struct object_info *oi)
 }
 
 static void hash_object_body(const struct git_hash_algo *algo, struct git_hash_ctx *c,
-			     const void *buf, unsigned long len,
+			     const void *buf, size_t len,
 			     struct object_id *oid,
 			     char *hdr, size_t *hdrlen)
 {
@@ -336,7 +336,7 @@ void write_object_file_prepare(const struct git_hash_algo *algo,
 	/* Generate the header */
 	*hdrlen = format_object_header(hdr, *hdrlen, type, len);
 
-	/* Sha1.. */
+	/* Hash (function pointers) computation */
 	hash_object_body(algo, &c, buf, len, oid, hdr, hdrlen);
 }
 
diff --git a/sha1dc_git.c b/sha1dc_git.c
index 9b675a046e..fe58d7962a 100644
--- a/sha1dc_git.c
+++ b/sha1dc_git.c
@@ -27,10 +27,9 @@ void git_SHA1DCFinal(unsigned char hash[20], SHA1_CTX *ctx)
 /*
  * Same as SHA1DCUpdate, but adjust types to match git's usual interface.
  */
-void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *vdata, unsigned long len)
+void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *vdata, size_t len)
 {
 	const char *data = vdata;
-	/* We expect an unsigned long, but sha1dc only takes an int */
 	while (len > INT_MAX) {
 		SHA1DCUpdate(ctx, data, INT_MAX);
 		data += INT_MAX;
diff --git a/sha1dc_git.h b/sha1dc_git.h
index f6f880cabe..0bcf1aa84b 100644
--- a/sha1dc_git.h
+++ b/sha1dc_git.h
@@ -15,7 +15,7 @@ void git_SHA1DCInit(SHA1_CTX *);
 #endif
 
 void git_SHA1DCFinal(unsigned char [20], SHA1_CTX *);
-void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *data, unsigned long len);
+void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *data, size_t len);
 
 #define platform_SHA_IS_SHA1DC /* used by "test-tool sha1-is-sha1dc" */
 
diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index 7867fd1dbf..f028a1cbcc 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -261,7 +261,7 @@ test_expect_success '--stdin outside of repository (uses default hash)' '
 	test_cmp expect actual
 '
 
-test_expect_failure EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
+test_expect_success EXPENSIVE,SIZE_T_IS_64BIT \
 		'files over 4GB hash literally' '
 	test-tool genzeros $((5*1024*1024*1024)) >big &&
 	test_oid large5GB >expect &&
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v2 2/6] object-file.c: use size_t for header lengths
From: Philip Oakley via GitGitGadget @ 2026-06-16 14:49 UTC (permalink / raw)
  To: git; +Cc: Philip Oakley, Patrick Steinhardt, Johannes Schindelin,
	Philip Oakley
In-Reply-To: <pull.2138.v2.git.1781621398.gitgitgadget@gmail.com>

From: Philip Oakley <philipoakley@iee.email>

Continue walking the code path for the >4GB `hash-object --literally`
test. The `hash_object_file_literally()` function internally uses both
`hash_object_file()` and `write_object_file_prepare()`. Both function
signatures use `unsigned long` rather than `size_t` for the mem buffer
sizes. Use `size_t` instead, for LLP64 compatibility.

While at it, convert those function's object's header buffer length to
`size_t` for consistency. The value is already upcast to `uintmax_t` for
print format compatibility.

Note: The hash-object test still does not pass. A subsequent commit
continues to walk the call tree's lower level hash functions to identify
further fixes.

Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 object-file.c         | 10 +++++-----
 object-file.h         |  6 +++---
 odb/source-files.c    |  2 +-
 odb/source-inmemory.c |  2 +-
 odb/source-loose.c    |  4 ++--
 odb/source.h          |  2 +-
 6 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/object-file.c b/object-file.c
index 9afa842da2..dccbe0fc3e 100644
--- a/object-file.c
+++ b/object-file.c
@@ -318,7 +318,7 @@ int parse_loose_header(const char *hdr, struct object_info *oi)
 static void hash_object_body(const struct git_hash_algo *algo, struct git_hash_ctx *c,
 			     const void *buf, unsigned long len,
 			     struct object_id *oid,
-			     char *hdr, int *hdrlen)
+			     char *hdr, size_t *hdrlen)
 {
 	algo->init_fn(c);
 	git_hash_update(c, hdr, *hdrlen);
@@ -327,9 +327,9 @@ static void hash_object_body(const struct git_hash_algo *algo, struct git_hash_c
 }
 
 void write_object_file_prepare(const struct git_hash_algo *algo,
-			       const void *buf, unsigned long len,
+			       const void *buf, size_t len,
 			       enum object_type type, struct object_id *oid,
-			       char *hdr, int *hdrlen)
+			       char *hdr, size_t *hdrlen)
 {
 	struct git_hash_ctx c;
 
@@ -472,11 +472,11 @@ out:
 }
 
 void hash_object_file(const struct git_hash_algo *algo, const void *buf,
-		      unsigned long len, enum object_type type,
+		      size_t len, enum object_type type,
 		      struct object_id *oid)
 {
 	char hdr[MAX_HEADER_LEN];
-	int hdrlen = sizeof(hdr);
+	size_t hdrlen = sizeof(hdr);
 
 	write_object_file_prepare(algo, buf, len, type, oid, hdr, &hdrlen);
 }
diff --git a/object-file.h b/object-file.h
index 528c4e6e69..4c87cd160b 100644
--- a/object-file.h
+++ b/object-file.h
@@ -131,12 +131,12 @@ int finalize_object_file_flags(struct repository *repo,
 			       enum finalize_object_file_flags flags);
 
 void hash_object_file(const struct git_hash_algo *algo, const void *buf,
-		      unsigned long len, enum object_type type,
+		      size_t len, enum object_type type,
 		      struct object_id *oid);
 void write_object_file_prepare(const struct git_hash_algo *algo,
-			       const void *buf, unsigned long len,
+			       const void *buf, size_t len,
 			       enum object_type type, struct object_id *oid,
-			       char *hdr, int *hdrlen);
+			       char *hdr, size_t *hdrlen);
 int write_loose_object(struct odb_source_loose *loose,
 		       const struct object_id *oid, char *hdr,
 		       int hdrlen, const void *buf, unsigned long len,
diff --git a/odb/source-files.c b/odb/source-files.c
index 5bdd042922..3b1261eba9 100644
--- a/odb/source-files.c
+++ b/odb/source-files.c
@@ -159,7 +159,7 @@ static int odb_source_files_freshen_object(struct odb_source *source,
 }
 
 static int odb_source_files_write_object(struct odb_source *source,
-					 const void *buf, unsigned long len,
+					 const void *buf, size_t len,
 					 enum object_type type,
 					 struct object_id *oid,
 					 struct object_id *compat_oid,
diff --git a/odb/source-inmemory.c b/odb/source-inmemory.c
index e004566d76..7f1b6f4636 100644
--- a/odb/source-inmemory.c
+++ b/odb/source-inmemory.c
@@ -227,7 +227,7 @@ static int odb_source_inmemory_count_objects(struct odb_source *source,
 }
 
 static int odb_source_inmemory_write_object(struct odb_source *source,
-					    const void *buf, unsigned long len,
+					    const void *buf, size_t len,
 					    enum object_type type,
 					    struct object_id *oid,
 					    struct object_id *compat_oid UNUSED,
diff --git a/odb/source-loose.c b/odb/source-loose.c
index 7d7ea2fb84..4cfa5b23fc 100644
--- a/odb/source-loose.c
+++ b/odb/source-loose.c
@@ -591,7 +591,7 @@ static int odb_source_loose_freshen_object(struct odb_source *source,
 }
 
 static int odb_source_loose_write_object(struct odb_source *source,
-					 const void *buf, unsigned long len,
+					 const void *buf, size_t len,
 					 enum object_type type, struct object_id *oid,
 					 struct object_id *compat_oid_in,
 					 enum odb_write_object_flags flags)
@@ -601,7 +601,7 @@ static int odb_source_loose_write_object(struct odb_source *source,
 	const struct git_hash_algo *compat = source->odb->repo->compat_hash_algo;
 	struct object_id compat_oid;
 	char hdr[MAX_HEADER_LEN];
-	int hdrlen = sizeof(hdr);
+	size_t hdrlen = sizeof(hdr);
 
 	/* Generate compat_oid */
 	if (compat) {
diff --git a/odb/source.h b/odb/source.h
index 2192a101b8..1c65a05e2c 100644
--- a/odb/source.h
+++ b/odb/source.h
@@ -199,7 +199,7 @@ struct odb_source {
 	 * return 0 on success, a negative error code otherwise.
 	 */
 	int (*write_object)(struct odb_source *source,
-			    const void *buf, unsigned long len,
+			    const void *buf, size_t len,
 			    enum object_type type,
 			    struct object_id *oid,
 			    struct object_id *compat_oid,
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v2 1/6] hash-object: demonstrate a >4GB/LLP64 problem
From: Philip Oakley via GitGitGadget @ 2026-06-16 14:49 UTC (permalink / raw)
  To: git; +Cc: Philip Oakley, Patrick Steinhardt, Johannes Schindelin,
	Philip Oakley
In-Reply-To: <pull.2138.v2.git.1781621398.gitgitgadget@gmail.com>

From: Philip Oakley <philipoakley@iee.email>

On LLP64 systems, such as Windows, the size of `long`, `int`, etc. is
only 32 bits (for backward compatibility). Git's use of `unsigned long`
for file memory sizes in many places, rather than size_t, limits the
handling of large files on LLP64 systems (commonly given as `>4GB`).

Provide a minimum test for handling a >4GB file. The `hash-object`
command, with the  `--literally` and without `-w` option avoids
writing the object, either loose or packed. This avoids the code paths
hitting the `bigFileThreshold` config test code, the zlib code, and the
pack code.

Subsequent patches will walk the test's call chain, converting types to
`size_t` (which is larger in LLP64 data models) where appropriate.

Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t1007-hash-object.sh | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index de076293b6..7867fd1dbf 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -49,6 +49,9 @@ test_expect_success 'setup' '
 
 	example sha1:ddd3f836d3e3fbb7ae289aa9ae83536f76956399
 	example sha256:b44fe1fe65589848253737db859bd490453510719d7424daab03daf0767b85ae
+
+	large5GB sha1:0be2be10a4c8764f32c4bf372a98edc731a4b204
+	large5GB sha256:dc18ca621300c8d3cfa505a275641ebab00de189859e022a975056882d313e64
 	EOF
 '
 
@@ -258,4 +261,12 @@ test_expect_success '--stdin outside of repository (uses default hash)' '
 	test_cmp expect actual
 '
 
+test_expect_failure EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
+		'files over 4GB hash literally' '
+	test-tool genzeros $((5*1024*1024*1024)) >big &&
+	test_oid large5GB >expect &&
+	git hash-object --stdin --literally <big >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v2 0/6] Support hashing objects larger than 4GB on Windows
From: Johannes Schindelin via GitGitGadget @ 2026-06-16 14:49 UTC (permalink / raw)
  To: git; +Cc: Philip Oakley, Patrick Steinhardt, Johannes Schindelin
In-Reply-To: <pull.2138.git.1780593313.gitgitgadget@gmail.com>

Philip Oakley has contributed these patches ~4.5 years ago, and they have
been carried in Git for Windows ever since.

Now that there are already other patch series flying around that try to
address various aspects about >4GB objects (which aren't handled well by Git
until it stops forcing unsigned long to do size_t's job), it seems a good
time to upstream these patches, too, at long last.

Changes since v1:

 * Rebased to current master to resolve the conflicts with
   ps/odb-source-loose
 * Dropped the !LONG_IS_64BIT prereq from the added/touched tests, as it is
   now no longer needed

Philip Oakley (6):
  hash-object: demonstrate a >4GB/LLP64 problem
  object-file.c: use size_t for header lengths
  hash algorithms: use size_t for section lengths
  hash-object --stdin: verify that it works with >4GB/LLP64
  hash-object: add another >4GB/LLP64 test case
  hash-object: add a >4GB/LLP64 test case using filtered input

 object-file.c          | 14 +++++++-------
 object-file.h          |  6 +++---
 odb/source-files.c     |  2 +-
 odb/source-inmemory.c  |  2 +-
 odb/source-loose.c     |  4 ++--
 odb/source.h           |  2 +-
 sha1dc_git.c           |  3 +--
 sha1dc_git.h           |  2 +-
 t/t1007-hash-object.sh | 39 +++++++++++++++++++++++++++++++++++++++
 9 files changed, 56 insertions(+), 18 deletions(-)


base-commit: 700432b2ba22603a0bcb71475c9c333d17c9b0d1
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2138%2Fdscho%2FPhilipOakley%2Fhashliteral_t-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2138/dscho/PhilipOakley/hashliteral_t-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/2138

Range-diff vs v1:

 1:  84e1cd0aa0 = 1:  9c01bac407 hash-object: demonstrate a >4GB/LLP64 problem
 2:  809d83e46f ! 2:  aa5859c14f object-file.c: use size_t for header lengths
     @@ Commit message
          Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
      
       ## object-file.c ##
     -@@ object-file.c: int odb_source_loose_read_object_info(struct odb_source *source,
     +@@ object-file.c: int parse_loose_header(const char *hdr, struct object_info *oi)
       static void hash_object_body(const struct git_hash_algo *algo, struct git_hash_ctx *c,
       			     const void *buf, unsigned long len,
       			     struct object_id *oid,
     @@ object-file.c: int odb_source_loose_read_object_info(struct odb_source *source,
      @@ object-file.c: static void hash_object_body(const struct git_hash_algo *algo, struct git_hash_c
       }
       
     - static void write_object_file_prepare(const struct git_hash_algo *algo,
     --				      const void *buf, unsigned long len,
     -+				      const void *buf, size_t len,
     - 				      enum object_type type, struct object_id *oid,
     --				      char *hdr, int *hdrlen)
     -+				      char *hdr, size_t *hdrlen)
     + void write_object_file_prepare(const struct git_hash_algo *algo,
     +-			       const void *buf, unsigned long len,
     ++			       const void *buf, size_t len,
     + 			       enum object_type type, struct object_id *oid,
     +-			       char *hdr, int *hdrlen)
     ++			       char *hdr, size_t *hdrlen)
       {
       	struct git_hash_ctx c;
       
     @@ object-file.c: out:
       
       	write_object_file_prepare(algo, buf, len, type, oid, hdr, &hdrlen);
       }
     -@@ object-file.c: cleanup:
     +
     + ## object-file.h ##
     +@@ object-file.h: int finalize_object_file_flags(struct repository *repo,
     + 			       enum finalize_object_file_flags flags);
     + 
     + void hash_object_file(const struct git_hash_algo *algo, const void *buf,
     +-		      unsigned long len, enum object_type type,
     ++		      size_t len, enum object_type type,
     + 		      struct object_id *oid);
     + void write_object_file_prepare(const struct git_hash_algo *algo,
     +-			       const void *buf, unsigned long len,
     ++			       const void *buf, size_t len,
     + 			       enum object_type type, struct object_id *oid,
     +-			       char *hdr, int *hdrlen);
     ++			       char *hdr, size_t *hdrlen);
     + int write_loose_object(struct odb_source_loose *loose,
     + 		       const struct object_id *oid, char *hdr,
     + 		       int hdrlen, const void *buf, unsigned long len,
     +
     + ## odb/source-files.c ##
     +@@ odb/source-files.c: static int odb_source_files_freshen_object(struct odb_source *source,
     + }
     + 
     + static int odb_source_files_write_object(struct odb_source *source,
     +-					 const void *buf, unsigned long len,
     ++					 const void *buf, size_t len,
     + 					 enum object_type type,
     + 					 struct object_id *oid,
     + 					 struct object_id *compat_oid,
     +
     + ## odb/source-inmemory.c ##
     +@@ odb/source-inmemory.c: static int odb_source_inmemory_count_objects(struct odb_source *source,
       }
       
     - int odb_source_loose_write_object(struct odb_source *source,
     --				  const void *buf, unsigned long len,
     -+				  const void *buf, size_t len,
     - 				  enum object_type type, struct object_id *oid,
     - 				  struct object_id *compat_oid_in,
     - 				  enum odb_write_object_flags flags)
     -@@ object-file.c: int odb_source_loose_write_object(struct odb_source *source,
     + static int odb_source_inmemory_write_object(struct odb_source *source,
     +-					    const void *buf, unsigned long len,
     ++					    const void *buf, size_t len,
     + 					    enum object_type type,
     + 					    struct object_id *oid,
     + 					    struct object_id *compat_oid UNUSED,
     +
     + ## odb/source-loose.c ##
     +@@ odb/source-loose.c: static int odb_source_loose_freshen_object(struct odb_source *source,
     + }
     + 
     + static int odb_source_loose_write_object(struct odb_source *source,
     +-					 const void *buf, unsigned long len,
     ++					 const void *buf, size_t len,
     + 					 enum object_type type, struct object_id *oid,
     + 					 struct object_id *compat_oid_in,
     + 					 enum odb_write_object_flags flags)
     +@@ odb/source-loose.c: static int odb_source_loose_write_object(struct odb_source *source,
       	const struct git_hash_algo *compat = source->odb->repo->compat_hash_algo;
       	struct object_id compat_oid;
       	char hdr[MAX_HEADER_LEN];
     @@ object-file.c: int odb_source_loose_write_object(struct odb_source *source,
       	/* Generate compat_oid */
       	if (compat) {
      
     - ## object-file.h ##
     -@@ object-file.h: int odb_source_loose_freshen_object(struct odb_source *source,
     - 				    const struct object_id *oid);
     - 
     - int odb_source_loose_write_object(struct odb_source *source,
     --				  const void *buf, unsigned long len,
     -+				  const void *buf, size_t len,
     - 				  enum object_type type, struct object_id *oid,
     - 				  struct object_id *compat_oid_in,
     - 				  enum odb_write_object_flags flags);
     -@@ object-file.h: int finalize_object_file_flags(struct repository *repo,
     - 			       enum finalize_object_file_flags flags);
     - 
     - void hash_object_file(const struct git_hash_algo *algo, const void *buf,
     --		      unsigned long len, enum object_type type,
     -+		      size_t len, enum object_type type,
     - 		      struct object_id *oid);
     - 
     - /* Helper to check and "touch" a file */
     + ## odb/source.h ##
     +@@ odb/source.h: struct odb_source {
     + 	 * return 0 on success, a negative error code otherwise.
     + 	 */
     + 	int (*write_object)(struct odb_source *source,
     +-			    const void *buf, unsigned long len,
     ++			    const void *buf, size_t len,
     + 			    enum object_type type,
     + 			    struct object_id *oid,
     + 			    struct object_id *compat_oid,
 3:  253d6f8004 ! 3:  b401eb490f hash algorithms: use size_t for section lengths
     @@ Commit message
          Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
      
       ## object-file.c ##
     -@@ object-file.c: int odb_source_loose_read_object_info(struct odb_source *source,
     +@@ object-file.c: int parse_loose_header(const char *hdr, struct object_info *oi)
       }
       
       static void hash_object_body(const struct git_hash_algo *algo, struct git_hash_ctx *c,
     @@ object-file.c: int odb_source_loose_read_object_info(struct odb_source *source,
       			     struct object_id *oid,
       			     char *hdr, size_t *hdrlen)
       {
     -@@ object-file.c: static void write_object_file_prepare(const struct git_hash_algo *algo,
     +@@ object-file.c: void write_object_file_prepare(const struct git_hash_algo *algo,
       	/* Generate the header */
       	*hdrlen = format_object_header(hdr, *hdrlen, type, len);
       
     @@ t/t1007-hash-object.sh: test_expect_success '--stdin outside of repository (uses
       '
       
      -test_expect_failure EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
     -+test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
     ++test_expect_success EXPENSIVE,SIZE_T_IS_64BIT \
       		'files over 4GB hash literally' '
       	test-tool genzeros $((5*1024*1024*1024)) >big &&
       	test_oid large5GB >expect &&
 4:  ba629a3f03 ! 4:  411727336a hash-object --stdin: verify that it works with >4GB/LLP64
     @@ Commit message
          Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
      
       ## t/t1007-hash-object.sh ##
     -@@ t/t1007-hash-object.sh: test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
     +@@ t/t1007-hash-object.sh: test_expect_success EXPENSIVE,SIZE_T_IS_64BIT \
       	test_cmp expect actual
       '
       
     -+test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
     ++test_expect_success EXPENSIVE,SIZE_T_IS_64BIT \
      +		'files over 4GB hash correctly via --stdin' '
      +	{ test -f big || test-tool genzeros $((5*1024*1024*1024)) >big; } &&
      +	test_oid large5GB >expect &&
 5:  f48d570bba ! 5:  e6bb4e6228 hash-object: add another >4GB/LLP64 test case
     @@ Commit message
          Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
      
       ## t/t1007-hash-object.sh ##
     -@@ t/t1007-hash-object.sh: test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
     +@@ t/t1007-hash-object.sh: test_expect_success EXPENSIVE,SIZE_T_IS_64BIT \
       	test_cmp expect actual
       '
       
     -+test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
     ++test_expect_success EXPENSIVE,SIZE_T_IS_64BIT \
      +		'files over 4GB hash correctly' '
      +	{ test -f big || test-tool genzeros $((5*1024*1024*1024)) >big; } &&
      +	test_oid large5GB >expect &&
 6:  8a6beeb16d ! 6:  568807ac34 hash-object: add a >4GB/LLP64 test case using filtered input
     @@ Commit message
          Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
      
       ## t/t1007-hash-object.sh ##
     -@@ t/t1007-hash-object.sh: test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
     +@@ t/t1007-hash-object.sh: test_expect_success EXPENSIVE,SIZE_T_IS_64BIT \
       	test_cmp expect actual
       '
       
      +# This clean filter does nothing, other than excercising the interface.
      +# We ensure that cleaning doesn't mangle large files on 64-bit Windows.
     -+test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
     ++test_expect_success EXPENSIVE,SIZE_T_IS_64BIT \
      +		'hash filtered files over 4GB correctly' '
      +	{ test -f big || test-tool genzeros $((5*1024*1024*1024)) >big; } &&
      +	test_oid large5GB >expect &&

-- 
gitgitgadget

^ permalink raw reply

* Re: [PATCH 3/6] hash algorithms: use size_t for section lengths
From: Johannes Schindelin @ 2026-06-16 14:48 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Philip Oakley via GitGitGadget, git, Philip Oakley
In-Reply-To: <ai-5VmawU2MRiAHQ@pks.im>

Hi Patrick,

On Tue, 16 Jun 2026, Patrick Steinhardt wrote:

> On Thu, Jun 04, 2026 at 05:15:09PM +0000, Philip Oakley via GitGitGadget wrote:
> > diff --git a/object-file.c b/object-file.c
> > index 1f5f9daf24..c648cecd80 100644
> > --- a/object-file.c
> > +++ b/object-file.c
> > @@ -581,7 +581,7 @@ static void write_object_file_prepare(const struct git_hash_algo *algo,
> >  	/* Generate the header */
> >  	*hdrlen = format_object_header(hdr, *hdrlen, type, len);
> >  
> > -	/* Sha1.. */
> > +	/* Hash (function pointers) computation */
> >  	hash_object_body(algo, &c, buf, len, oid, hdr, hdrlen);
> >  }
> >  
> 
> Thanks for updating this comment while at it :)

It wasn't my idea, it was Claude Opus'. I would have left it as-is, but
then decided that it's actually a good change and not worth splitting out
into a separate commit.

> > diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
> > index 7867fd1dbf..10382a815e 100755
> > --- a/t/t1007-hash-object.sh
> > +++ b/t/t1007-hash-object.sh
> > @@ -261,7 +261,7 @@ test_expect_success '--stdin outside of repository (uses default hash)' '
> >  	test_cmp expect actual
> >  '
> >  
> > -test_expect_failure EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
> > +test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
> >  		'files over 4GB hash literally' '
> >  	test-tool genzeros $((5*1024*1024*1024)) >big &&
> >  	test_oid large5GB >expect &&
> 
> Previously we required `!LONG_IS_64BIT`, because the test would have
> succeeded on platforms where it is 64 bit wide. But now that this test
> works on all platforms I rather wonder whether we should completely drop
> that prerequisite here, as we expect it to pass regardless of whether or
> not long is 64 bit now.

Good point!

Thank you for the review,
Johannes

^ permalink raw reply

* Re: [PATCH 5/6] hash-object: add another >4GB/LLP64 test case
From: Johannes Schindelin @ 2026-06-16 14:48 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Philip Oakley via GitGitGadget, git, Philip Oakley
In-Reply-To: <ai-5ZvDgc4smGfGc@pks.im>

Hi Patrick,

On Tue, 16 Jun 2026, Patrick Steinhardt wrote:

> On Thu, Jun 04, 2026 at 05:15:11PM +0000, Philip Oakley via GitGitGadget wrote:
> > diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
> > index 59efee3aff..f2722380ee 100755
> > --- a/t/t1007-hash-object.sh
> > +++ b/t/t1007-hash-object.sh
> > @@ -277,4 +277,12 @@ test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
> >  	test_cmp expect actual
> >  '
> >  
> > +test_expect_success EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
> > +		'files over 4GB hash correctly' '
> > +	{ test -f big || test-tool genzeros $((5*1024*1024*1024)) >big; } &&
> > +	test_oid large5GB >expect &&
> > +	git hash-object -- big >actual &&
> > +	test_cmp expect actual
> > +'
> 
> Same comment here.

[Comment was the suggestion to drop the !LONG_IS_64BIT prerequisite]

Same comment here. [My reply: Good point!]

> Nit: I feel like we could've easily introduced all of these tests in the
> first commit.

Sure, but I actually liked the structuring by Philip when I accepted the
patches into Git for Windows, and I still do: The commits all have
slightly different concerns, and I love that the cognitive load is
lightened somewhat by keeping those concerns in separate patches with
separate commit messages.

Ciao,
Johannes

^ permalink raw reply

* Re: [PATCH 2/6] object-file.c: use size_t for header lengths
From: Johannes Schindelin @ 2026-06-16 14:48 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Philip Oakley via GitGitGadget, git, Philip Oakley
In-Reply-To: <ai-5XO9gsc_HdMFX@pks.im>

Hi Patrick,

On Mon, 15 Jun 2026, Patrick Steinhardt wrote:

> On Thu, Jun 04, 2026 at 05:15:08PM +0000, Philip Oakley via GitGitGadget wrote:
> > From: Philip Oakley <philipoakley@iee.email>
> > 
> > Continue walking the code path for the >4GB `hash-object --literally`
> > test. The `hash_object_file_literally()` function internally uses both
> > `hash_object_file()` and `write_object_file_prepare()`. Both function
> > signatures use `unsigned long` rather than `size_t` for the mem buffer
> > sizes. Use `size_t` instead, for LLP64 compatibility.
> > 
> > While at it, convert those function's object's header buffer length to
> > `size_t` for consistency. The value is already upcast to `uintmax_t` for
> > print format compatibility.
> 
> One thing I was wondering is whether we should rather migrate to a size
> that is consistent across different platforms. We could e.g. `typedef
> uint64_t objsize_t` and then use that going forward.

No, the point of `size_t` is to represent what the current platform can
handle in-memory. That cannot (and should not) be consolidated.

> I guess the question though is whether that'd buy us anything. In other
> words, are there any platforms that we care about where `size_t` is only
> 32 bit wide? And would such platforms even be able to handle such large
> objects?

There are ways to handle objects larger than 4GB on 32-bit platforms, via
streaming. In those cases, what you need is `off_t`, not `size_t`.

Obviously, there is a large class of problems with such setups. For
example, you can forget about efficiently reconstructing a large Git
object from a delta chain. If you cannot do that in-memory, trying to work
around that limitation merely opens up the user for a world of pain.

Ciao,
Johannes

^ permalink raw reply

* Re: [PATCH v3] read_gitfile(): simplify NOT_A_REPO error message
From: Jeff King @ 2026-06-16 14:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Patrick Steinhardt, git, Tian Yuchen
In-Reply-To: <xmqq7bnya7gh.fsf@gitster.g>

On Tue, Jun 16, 2026 at 07:25:02AM -0700, Junio C Hamano wrote:

> >     +@@ t/t7450-bad-git-dotfiles.sh: test_expect_success 'git dirs of sibling submodules must not be nested' '
> >     + test_expect_success 'submodule git dir nesting detection must work with parallel cloning' '
> >     + 	test_must_fail git clone --recurse-submodules --jobs=2 nested clone_parallel 2>err &&
> >     + 	cat err &&
> >     +-	grep -E "(already exists|is inside git dir|not a git repository)" err &&
> >     ++	grep -E "(already exists|is inside git dir|does not point to a valid repository)" err &&
> 
> A few things.
> 
>  * Will we be happy to see only one of these possibilities, or do we
>    expect to see these once for each kind?

I imagine it is only one. This all comes from 9cf8547320 (clone: prevent
clashing git dirs when cloning submodule in parallel, 2024-01-28), and
it is expecting the nested path to cause a failure. Which failure I
guess depends on the racy ordering. If we create the inner one first,
then we probably get "already exists", and if the outer one, then "is
inside git dir". I don't know exactly what sequence yields the
NOT_A_REPO message.

But none of that is changing in this patch, just what the user-visible
text is for the NOT_A_REPO case.

I did briefly wonder if we might see "not a git repository" from a
_different_ code path, and need to catch it along with the new message.
But running successfully with --stress implies that we never see the old
one anymore.

>  * a recently started in-flight topic tries to catch bare "grep" and
>    fails until you write test_grep X-<.

Yeah. This will create a merge conflict for you, but hopefully the
resolution should be obvious. I don't think it makes sense to fix here,
as it's orthogonal to the purpose of the patch.

-Peff

^ permalink raw reply

* Re: [PATCH v3] read_gitfile(): simplify NOT_A_REPO error message
From: Junio C Hamano @ 2026-06-16 14:25 UTC (permalink / raw)
  To: Jeff King; +Cc: Patrick Steinhardt, git, Tian Yuchen
In-Reply-To: <20260616123516.GA2301231@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Tue, Jun 16, 2026 at 07:19:20AM -0400, Jeff King wrote:
>
>> Here it is.

Thanks.

>     +@@ t/t7450-bad-git-dotfiles.sh: test_expect_success 'git dirs of sibling submodules must not be nested' '
>     + test_expect_success 'submodule git dir nesting detection must work with parallel cloning' '
>     + 	test_must_fail git clone --recurse-submodules --jobs=2 nested clone_parallel 2>err &&
>     + 	cat err &&
>     +-	grep -E "(already exists|is inside git dir|not a git repository)" err &&
>     ++	grep -E "(already exists|is inside git dir|does not point to a valid repository)" err &&

A few things.

 * Will we be happy to see only one of these possibilities, or do we
   expect to see these once for each kind?

 * a recently started in-flight topic tries to catch bare "grep" and
   fails until you write test_grep X-<.

> We also racily trigger this in t7450. During parallel cloning we might
> see one of several errors, including this one. And so we must update
> that message, too (you can otherwise find the failure pretty quickly by
> running t7450 with --stress).


^ permalink raw reply

* Re: [PATCH] Add a test about broken notes handling on rebase
From: Phillip Wood @ 2026-06-16 13:12 UTC (permalink / raw)
  To: Uwe Kleine-König, git
In-Reply-To: <20260612143952.3281115-2-u.kleine-koenig@baylibre.com>

On 12/06/2026 15:39, Uwe Kleine-König wrote:
> When a commit disappears during rebase because the patch content is
> already there (but not by the same patch in which case the commit would
> be skipped) the notes of that disappearing commit still survives and is
> added to the (rebased) parent of the disappearing commit.
> 
> So with the commit graph
> 
>   A -- B -- C
>    `
>     `-BD
> 
> where BD includes the changes done in B, when rebasing C on top of BD,
> the note for B should disappear and not be added to BD.
> 
> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
> ---
> Hello,
> 
> this is a behaviour of git that really bothers me when working on big
> patch series. I use notes to track the Message-Id of the patches when I
> send them out. Then when rebasing to a newer upstream version, the
> tracking gets confused because the Message-Id notes end up on commits
> that were not sent out yet (or I got two Message-Ids in them).
> 
> I reported that already back in 2023[1],

That thread includes a suggestion on how to fix it if anyone reading 
this is interesting in working on it.

> but obviously not in a way that
> resulted in a fix. So I'm trying again with a patch that adds a failing
> test.

I'm not sure carrying this test makes it any more likely that it will be 
fixed, though your mail might get someone interested in fixing it. Don't 
we already have some relevant tests t3400 rather than adding a whole new 
file for a single test?

Thanks

Phillip

> Best regards
> Uwe
> 
> [1] https://lore.kernel.org/git/20230530092155.3zbb5uxa7eisdzxb@pengutronix.de/
> 
>   t/meson.build           |  1 +
>   t/t3322-notes-rebase.sh | 35 +++++++++++++++++++++++++++++++++++
>   2 files changed, 36 insertions(+)
>   create mode 100644 t/t3322-notes-rebase.sh
> 
> diff --git a/t/meson.build b/t/meson.build
> index c5832fee0535..6927bd9c794f 100644
> --- a/t/meson.build
> +++ b/t/meson.build
> @@ -358,6 +358,7 @@ integration_tests = [
>     't3311-notes-merge-fanout.sh',
>     't3320-notes-merge-worktrees.sh',
>     't3321-notes-stripspace.sh',
> +  't3322-notes-rebase.sh',
>     't3400-rebase.sh',
>     't3401-rebase-and-am-rename.sh',
>     't3402-rebase-merge.sh',
> diff --git a/t/t3322-notes-rebase.sh b/t/t3322-notes-rebase.sh
> new file mode 100644
> index 000000000000..64c40a523b50
> --- /dev/null
> +++ b/t/t3322-notes-rebase.sh
> @@ -0,0 +1,35 @@
> +#!/bin/sh
> +
> +test_description='Test notes on rebase'
> +
> +. ./test-lib.sh
> +
> +test_expect_success setup '
> +	git init &&
> +	echo A > A &&
> +	git add A &&
> +	git commit -m A &&
> +	git branch branch &&
> +	echo B > B &&
> +	git add B &&
> +	git commit -m B &&
> +	git notes add -m "This is B" @ &&
> +	echo C > C &&
> +	git add C &&
> +	git commit -m C &&
> +	git checkout branch &&
> +	echo B > B &&
> +	echo D > D &&
> +	git add B D &&
> +	git commit -m BD
> +'
> +
> +test_expect_success 'rebase B + C on top of BD' '
> +	git rebase @ master
> +'
> +
> +test_expect_failure 'assert there is no note on BD' '
> +	git notes show branch
> +'
> +
> +test_done
> 
> base-commit: 3e65291872de10c3f0bf05ea8c24187e7a71ebf0


^ permalink raw reply

* Re: [GSoC Patch v5 2/4] rev-parse: use append_formatted_path() for path formatting
From: Phillip Wood @ 2026-06-16 13:08 UTC (permalink / raw)
  To: K Jayatheerth, git
  Cc: jltobler, lucasseikioshiro, gitster, phillip.wood, sandals,
	kumarayushjha123, a3205153416
In-Reply-To: <20260616044953.184806-3-jayatheerthkulkarni2005@gmail.com>

On 16/06/2026 05:49, K Jayatheerth wrote:
> -static void print_path(const char *path, const char *prefix, enum format_type format, enum default_type def)
> +static void print_path(const char *path, const char *prefix,
> +		       enum path_format arg_path_format, enum path_format def_format)
>   {
> -	char *cwd = NULL;
> -	/*
> -	 * We don't ever produce a relative path if prefix is NULL, so set the
> -	 * prefix to the current directory so that we can produce a relative
> -	 * path whenever possible.  If we're using RELATIVE_IF_SHARED mode, then
> -	 * we want an absolute path unless the two share a common prefix, so don't
> -	 * set it in that case, since doing so causes a relative path to always
> -	 * be produced if possible.
> -	 */
> -	if (!prefix && (format != FORMAT_DEFAULT || def != DEFAULT_RELATIVE_IF_SHARED))
> -		prefix = cwd = xgetcwd();
> -	if (format == FORMAT_DEFAULT && def == DEFAULT_UNMODIFIED) {
> -		puts(path);
> -	} else if (format == FORMAT_RELATIVE ||
> -		  (format == FORMAT_DEFAULT && def == DEFAULT_RELATIVE)) {
> -		/*
> -		 * In order for relative_path to work as expected, we need to
> -		 * make sure that both paths are absolute paths.  If we don't,
> -		 * we can end up with an unexpected absolute path that the user
> -		 * didn't want.
> -		 */
> -		struct strbuf buf = STRBUF_INIT, realbuf = STRBUF_INIT, prefixbuf = STRBUF_INIT;
> -		if (!is_absolute_path(path)) {
> -			strbuf_realpath_forgiving(&realbuf, path,  1);
> -			path = realbuf.buf;
> -		}
> -		if (!is_absolute_path(prefix)) {
> -			strbuf_realpath_forgiving(&prefixbuf, prefix, 1);
> -			prefix = prefixbuf.buf;
> -		}
> -		puts(relative_path(path, prefix, &buf));
> -		strbuf_release(&buf);
> -		strbuf_release(&realbuf);
> -		strbuf_release(&prefixbuf);
> -	} else if (format == FORMAT_DEFAULT && def == DEFAULT_RELATIVE_IF_SHARED) {
> -		struct strbuf buf = STRBUF_INIT;
> -		puts(relative_path(path, prefix, &buf));
> -		strbuf_release(&buf);
> -	} else {
> -		struct strbuf buf = STRBUF_INIT;
> -		strbuf_realpath_forgiving(&buf, path, 1);
> -		puts(buf.buf);
> -		strbuf_release(&buf);
> -	}
> -	free(cwd);
> +	struct strbuf sb = STRBUF_INIT;
> +	/* If the user didn't explicitly specify a format, fallback to the path-specific default. */
> +	enum path_format fmt = (arg_path_format != PATH_FORMAT_DEFAULT) ? arg_path_format : def_format;
> +
> +	append_formatted_path(&sb, path, prefix, fmt);
> +	puts(sb.buf);
> +
> +	strbuf_release(&sb);
>   }
>   
>   int cmd_rev_parse(int argc,
> @@ -717,7 +661,7 @@ int cmd_rev_parse(int argc,
>   	const char *name = NULL;
>   	struct strbuf buf = STRBUF_INIT;
>   	int seen_end_of_options = 0;
> -	enum format_type format = FORMAT_DEFAULT;
> +	enum path_format arg_path_format = PATH_FORMAT_DEFAULT;

This is the source of the api wart I referred to in the previous patch. 
Could we keep the existing enums and convert them into the appropriate 
PATH_FORMAT_* flag in print_path() above? I think we already have the 
logic to do that in the existing code. That would mean that other users 
of append_formatted_path() don't have to worry about the extra flag.

Thanks

Phillip

>   
>   	show_usage_if_asked(argc, argv, builtin_rev_parse_usage);
>   
> @@ -797,8 +741,8 @@ int cmd_rev_parse(int argc,
>   					die(_("--git-path requires an argument"));
>   				print_path(repo_git_path_replace(the_repository, &buf,
>   								 "%s", argv[i + 1]), prefix,
> -						format,
> -						DEFAULT_RELATIVE_IF_SHARED);
> +						arg_path_format,
> +						PATH_FORMAT_RELATIVE_IF_SHARED);
>   				i++;
>   				continue;
>   			}
> @@ -820,9 +764,9 @@ int cmd_rev_parse(int argc,
>   				if (!arg)
>   					die(_("--path-format requires an argument"));
>   				if (!strcmp(arg, "absolute")) {
> -					format = FORMAT_CANONICAL;
> +					arg_path_format = PATH_FORMAT_CANONICAL;
>   				} else if (!strcmp(arg, "relative")) {
> -					format = FORMAT_RELATIVE;
> +					arg_path_format = PATH_FORMAT_RELATIVE;
>   				} else {
>   					die(_("unknown argument to --path-format: %s"), arg);
>   				}
> @@ -985,7 +929,7 @@ int cmd_rev_parse(int argc,
>   			if (!strcmp(arg, "--show-toplevel")) {
>   				const char *work_tree = repo_get_work_tree(the_repository);
>   				if (work_tree)
> -					print_path(work_tree, prefix, format, DEFAULT_UNMODIFIED);
> +					print_path(work_tree, prefix, arg_path_format, PATH_FORMAT_UNMODIFIED);
>   				else
>   					die(_("this operation must be run in a work tree"));
>   				continue;
> @@ -993,7 +937,7 @@ int cmd_rev_parse(int argc,
>   			if (!strcmp(arg, "--show-superproject-working-tree")) {
>   				struct strbuf superproject = STRBUF_INIT;
>   				if (get_superproject_working_tree(&superproject))
> -					print_path(superproject.buf, prefix, format, DEFAULT_UNMODIFIED);
> +					print_path(superproject.buf, prefix, arg_path_format, PATH_FORMAT_UNMODIFIED);
>   				strbuf_release(&superproject);
>   				continue;
>   			}
> @@ -1028,18 +972,18 @@ int cmd_rev_parse(int argc,
>   				const char *gitdir = getenv(GIT_DIR_ENVIRONMENT);
>   				char *cwd;
>   				int len;
> -				enum format_type wanted = format;
> +				enum path_format wanted = arg_path_format;
>   				if (arg[2] == 'g') {	/* --git-dir */
>   					if (gitdir) {
> -						print_path(gitdir, prefix, format, DEFAULT_UNMODIFIED);
> +						print_path(gitdir, prefix, arg_path_format, PATH_FORMAT_UNMODIFIED);
>   						continue;
>   					}
>   					if (!prefix) {
> -						print_path(".git", prefix, format, DEFAULT_UNMODIFIED);
> +						print_path(".git", prefix, arg_path_format, PATH_FORMAT_UNMODIFIED);
>   						continue;
>   					}
>   				} else {		/* --absolute-git-dir */
> -					wanted = FORMAT_CANONICAL;
> +					wanted = PATH_FORMAT_CANONICAL;
>   					if (!gitdir && !prefix)
>   						gitdir = ".git";
>   					if (gitdir) {
> @@ -1055,11 +999,11 @@ int cmd_rev_parse(int argc,
>   				strbuf_reset(&buf);
>   				strbuf_addf(&buf, "%s%s.git", cwd, len && cwd[len-1] != '/' ? "/" : "");
>   				free(cwd);
> -				print_path(buf.buf, prefix, wanted, DEFAULT_CANONICAL);
> +				print_path(buf.buf, prefix, wanted, PATH_FORMAT_CANONICAL);
>   				continue;
>   			}
>   			if (!strcmp(arg, "--git-common-dir")) {
> -				print_path(repo_get_common_dir(the_repository), prefix, format, DEFAULT_RELATIVE_IF_SHARED);
> +				print_path(repo_get_common_dir(the_repository), prefix, arg_path_format, PATH_FORMAT_RELATIVE_IF_SHARED);
>   				continue;
>   			}
>   			if (!strcmp(arg, "--is-inside-git-dir")) {
> @@ -1089,7 +1033,7 @@ int cmd_rev_parse(int argc,
>   				if (the_repository->index->split_index) {
>   					const struct object_id *oid = &the_repository->index->split_index->base_oid;
>   					const char *path = repo_git_path_replace(the_repository, &buf, "sharedindex.%s", oid_to_hex(oid));
> -					print_path(path, prefix, format, DEFAULT_RELATIVE);
> +					print_path(path, prefix, arg_path_format, PATH_FORMAT_RELATIVE);
>   				}
>   				continue;
>   			}


^ permalink raw reply

* Re: [GSoC Patch v5 1/4] path: introduce append_formatted_path() for shared path formatting
From: Phillip Wood @ 2026-06-16 13:08 UTC (permalink / raw)
  To: K Jayatheerth, git
  Cc: jltobler, lucasseikioshiro, gitster, phillip.wood, sandals,
	kumarayushjha123, a3205153416
In-Reply-To: <20260616044953.184806-2-jayatheerthkulkarni2005@gmail.com>

On 16/06/2026 05:49, K Jayatheerth wrote:
> The path-formatting logic in builtin/rev-parse.c is tightly coupled
> to that command and writes directly to stdout, making it impossible
> for other builtins to reuse.
> 
> Extract the core algorithm into append_formatted_path() in path.c
> and expose a path_format enum in path.h so that any builtin can
> format paths consistently without duplicating logic.

Sorry I haven't had time to look at this series recently, it is looking 
much nicer now that we have a single enum. It would be helpful to 
explain why we need PATH_FORMAT_DEFAULT that acts exactly like 
PATH_FORMAT_UNMODIFIED. Looking at the next patch it seems this is still 
a wart in the api due to rev-parse wanting needing to distinguish the 
unmodified case from the default case.

Thanks

Phillip

> Mentored-by: Justin Tobler <jltobler@gmail.com>
> Mentored-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
> Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com>
> ---
>   path.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   path.h | 36 ++++++++++++++++++++++++++++++
>   2 files changed, 106 insertions(+)
> 
> diff --git a/path.c b/path.c
> index d7e17bf174..5e83e3e4f6 100644
> --- a/path.c
> +++ b/path.c
> @@ -1579,6 +1579,76 @@ char *xdg_cache_home(const char *filename)
>   	return NULL;
>   }
>   
> +void append_formatted_path(struct strbuf *dest, const char *path,
> +			   const char *prefix, enum path_format format)
> +{
> +	switch (format) {
> +	case PATH_FORMAT_DEFAULT:
> +	case PATH_FORMAT_UNMODIFIED:
> +		strbuf_addstr(dest, path);
> +		break;
> +
> +	case PATH_FORMAT_RELATIVE: {
> +		struct strbuf relative_buf = STRBUF_INIT;
> +		struct strbuf real_path = STRBUF_INIT;
> +		struct strbuf real_prefix = STRBUF_INIT;
> +		char *cwd = NULL;
> +
> +		/*
> +		 * We don't ever produce a relative path if prefix is NULL,
> +		 * so set the prefix to the current directory so that we can
> +		 * produce a relative path whenever possible.
> +		 */
> +		if (!prefix)
> +			prefix = cwd = xgetcwd();
> +
> +		if (!is_absolute_path(path)) {
> +			strbuf_realpath_forgiving(&real_path, path, 1);
> +			path = real_path.buf;
> +		}
> +		if (!is_absolute_path(prefix)) {
> +			strbuf_realpath_forgiving(&real_prefix, prefix, 1);
> +			prefix = real_prefix.buf;
> +		}
> +
> +		strbuf_addstr(dest, relative_path(path, prefix, &relative_buf));
> +
> +		strbuf_release(&relative_buf);
> +		strbuf_release(&real_path);
> +		strbuf_release(&real_prefix);
> +		free(cwd);
> +		break;
> +	}
> +
> +	case PATH_FORMAT_RELATIVE_IF_SHARED: {
> +		struct strbuf relative_buf = STRBUF_INIT;
> +
> +		/*
> +		 * If we're using RELATIVE_IF_SHARED mode, then we want an
> +		 * absolute path unless the two share a common prefix, so don't
> +		 * default the prefix to the current working directory. Doing so
> +		 * would cause a relative path to always be produced if possible.
> +		 */
> +		strbuf_addstr(dest, relative_path(path, prefix, &relative_buf));
> +		strbuf_release(&relative_buf);
> +		break;
> +	}
> +
> +	case PATH_FORMAT_CANONICAL: {
> +		struct strbuf canonical_buf = STRBUF_INIT;
> +
> +		strbuf_realpath_forgiving(&canonical_buf, path, 1);
> +		strbuf_addbuf(dest, &canonical_buf);
> +
> +		strbuf_release(&canonical_buf);
> +		break;
> +	}
> +
> +	default:
> +		BUG("unknown path_format value %d", format);
> +	}
> +}
> +
>   REPO_GIT_PATH_FUNC(squash_msg, "SQUASH_MSG")
>   REPO_GIT_PATH_FUNC(merge_msg, "MERGE_MSG")
>   REPO_GIT_PATH_FUNC(merge_rr, "MERGE_RR")
> diff --git a/path.h b/path.h
> index 0434ba5e07..6aca53b100 100644
> --- a/path.h
> +++ b/path.h
> @@ -262,6 +262,42 @@ enum scld_error safe_create_leading_directories_no_share(char *path);
>   int safe_create_file_with_leading_directories(struct repository *repo,
>   					      const char *path);
>   
> +/**
> + * The formatting strategy to apply when writing a path into a buffer.
> + */
> +enum path_format {
> +	/*
> +	 * Represents the default formatting behavior. Treated as
> +	 * PATH_FORMAT_UNMODIFIED by append_formatted_path().
> +	 */
> +	PATH_FORMAT_DEFAULT,
> +
> +	/* Output the path exactly as-is without any modifications. */
> +	PATH_FORMAT_UNMODIFIED,
> +
> +	/* Output a path relative to the provided directory prefix. */
> +	PATH_FORMAT_RELATIVE,
> +
> +	/* Output a relative path only if the path shares a root with the prefix. */
> +	PATH_FORMAT_RELATIVE_IF_SHARED,
> +
> +	/* Output a fully resolved, absolute canonical path. */
> +	PATH_FORMAT_CANONICAL
> +};
> +
> +/**
> + * Format a path according to the specified formatting strategy and append
> + * the result to the given strbuf.
> + *
> + * `dest`   : The string buffer to append the formatted path to.
> + * `path`   : The path string that needs to be formatted.
> + * `prefix` : The directory prefix to calculate relative offsets against.
> + * Pass NULL to default to the current working directory where applicable.
> + * `format` : The formatting behavior rule to execute.
> + */
> +void append_formatted_path(struct strbuf *dest, const char *path,
> +			   const char *prefix, enum path_format format);
> +
>   # ifdef USE_THE_REPOSITORY_VARIABLE
>   #  include "strbuf.h"
>   #  include "repository.h"


^ permalink raw reply

* Re: [PATCH v5 2/2] graph: indent visual root in graph
From: Pablo Sabater @ 2026-06-16 13:06 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Taylor Blau, git, ayu.chandekar, chandrapratap3519,
	christian.couder, jltobler, karthik.188, peff, phillip.wood,
	siddharthasthana31
In-Reply-To: <xmqqzf0vbyj8.fsf@gitster.g>

El lun, 15 jun 2026 a las 17:42, Junio C Hamano (<gitster@pobox.com>) escribió:
>
> Pablo Sabater <pabloosabaterr@gmail.com> writes:
>
> > It does not make it unpredictable but it makes it not output what I
> > wanted to test, what I wanted to test is having an active column at
> > the same time that visual roots in different cases were being rendered
> > on another column.
>
> Oh, use of commit-graph changes the traversal order, which would
> affect how the graph is drawn, and there is no way to ensure that we
> traverse in the same way with or without commit-graph?  That's
> inconvenient.  But even without commit-graph, do we guarantee the
> same traversal order forever?  I doubt it.  So I suspect that it is
> a brittle workaround to disable commit-graph in the longer term.

Hi!

About the traversal order, aren't all the graph tests dependent on the
traversal order?  If it changed they would all need to be updated
because the tests are hardcoded expects of the graph.
I guess it might be more brittle than other graph tests specially
because it also depends on removing files, I tried using "git config
core.commitGraph false" or "--date-order" but I still get different
results and removing the files fixed it. If someone knows a better way
of doing it I'm happy to change it.

>
> As long as the graph engine shows correct graph no matter what order
> the commits come out of the revision traversal engine, we won't hurt
> end-users, but we need our tests to be reproducible, so that is a
> bit unfortunate.
>
> Anyway, stepping back a bit,
>
> > However having GIT_TEST_COMMIT_GRAPH in the last
> > text for example changes from:
> >
> > * 41_octopus
> > | * 43_B
> > |  \
> > |   * 43_A
> > | * 42_B
> > | * 42_A
> > * 41_B
> > * 41_A
>
> Does the "vertically aligned * on 2nd and later columns do not mean
> any parent-child relationship" rule no longer apply in this version?
> IOW, does the above graph show that
>
>  - 41_A is a parent of 41_B, which is a parent of 41_octopus
>  - 42_A is a parent of 42_B, and
>  - 43_A is a parent of 43_B but is not related to 42_B

Yes, this means that all commits vertically adjacent are related,
those who are not related and can cause that ambiguity get indented
(43_A).

>
> ?  Who are the parents of 41_octopus?  It has no relationship with
> 42_B and 43_B, and unlike what its name suggests, it has only 41_b
> as its parent (probably with history simplification that makes only
> these commits shown)?

On this test we are using "--first-parent" which excludes all the
parents but the first one, but later we force its excluded parents to
be shown.
We exclude 42_* and 43_* branches and then force them to appear as
unrelated branches.

>
> > to:
> >
> > * 41_octopus
> > * 41_B
> >  \
> >   * 41_A
> > * 43_B
> >  \
> >   * 43_A
> > * 42_B
> > * 42_A
>
> And this graph shows the same inter-commit relationship.  So both
> are correctly showing what we want to express, but they show the
> same information differently, making test_cmp unhappy?

Yes they show the same information.  On the second graph every commit
is on the first column (or second if they get indented) but on the
first graph we have:

*
| * <- visual root on second column
^
`----- first column remains active

If you tested v3 with this case you would see that it assumes that
visual roots only happen to be rendered on the first column, therefore
failing to correctly indent those visual roots on the second column,
which this test proves that they can appear on other columns.

Back to the test:

  * 41_octopus
  | * 43_B
  |  \
  |   * 43_A
  | * 42_B
  | * 42_A
  * 41_B
  * 41_A

43_A is rendered on the second column (first column is active by the
41_* branch) and gets indented to the third one. With commit-graph it
would be on the first and get indented to the second, making it the
same as more general tests above in "t4218", it is an edge case but
shows that indentation works correctly independently where the visual
root is.

>
> Thanks.

Thanks,

Pablo

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox