Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH v4 00/13] pack-objects: integrate --path-walk and some --filter options
From: Derrick Stolee @ 2026-05-21 23:01 UTC (permalink / raw)
  To: Taylor Blau, Derrick Stolee via GitGitGadget
  Cc: git, christian.couder, gitster, johannes.schindelin, johncai86,
	karthik.188, kristofferhaugsbakk, newren, peff, ps
In-Reply-To: <agz3/ZxZZHBKofR9@nand.local>

On 5/19/26 7:53 PM, Taylor Blau wrote:
> On Wed, May 13, 2026 at 09:18:42PM +0000, Derrick Stolee via GitGitGadget wrote:
>> UPDATES IN V4
>> =============
>>
>> Thanks, Taylor for the careful review.
>>
>>   * Several typos are fixed.
>>   * The performance test is corrected for issues around piping Git commands
>>     and made more robust to the existence of submodules.
>>   * BIG: The tree:0 patch is significantly updated in this version. Taylor
>>     correctly smelled a problem with the new logic to emit the /tagged-trees
>>     object set, and that signaled that those trees were previously never
>>     emitted. I update the test to demonstrate that changing the data shape
>>     (including tagged trees that are otherwise-unreachable) doesn't change
>>     the test behavior, signaling a bug. The behavior change details all the
>>     complexities of visiting only directly-requested trees under a tree:0
>>     filter and recursing on all trees in other cases.
> 
> Thanks for the new round; I gave this a lighter pass since I had
> reviewed v3 in detail and the range-diff here looks good. I focused in
> on a few patches in particular, and left a couple of minor comments.
> 
> My main reservation is that the "path starts with a '/' slash character
> when directly requested" behavior feels brittle to me, and I am not sure
> if there is a cleaner way to express that.
> 
> I'm curious what your thoughts are there. I think barring that things
> are near-complete here, though I did note one issue with the t/perf
> changes (that is my fault for having a bad suggestion on the earlier
> round).

I like the suggested change to t/perf but I don't share your concerns
around the '/' character in the path (I go deeper into why in the
thread).

Thanks,
-Stolee


^ permalink raw reply

* Re: [PATCH v11] checkout: extend --track with a "fetch" mode to refresh start-point
From: Junio C Hamano @ 2026-05-21 23:53 UTC (permalink / raw)
  To: Phillip Wood
  Cc: Harald Nordgren via GitGitGadget, git, Ramsay Jones,
	D. Ben Knoble, Kristoffer Haugsbakk, Marc Branchaud,
	Harald Nordgren
In-Reply-To: <01526f43-86aa-466f-a1e8-054284e1a2e1@gmail.com>

Phillip Wood <phillip.wood123@gmail.com> writes:
> Don't we want to avoid creating a branch with an ambiguous upstream so 
> that a subsequent "git pull" works though? Looking at 
> branch.c:setup_tracking() it seems to reject upstream branches that 
> match more than one remote.

Yes.  We may try to avoid creating misconfigured branches in the
first place, but we'd want to make sure other parts of the system
that consumes existing branches to behave as sensible as possible,
no?

Thanks.

^ permalink raw reply

* Re: [PATCH 4/9] run-command: add support for timeout in command finisher
From: Junio C Hamano @ 2026-05-22  0:10 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: Siddh Raman Pant, git@vger.kernel.org, newren@gmail.com,
	ps@pks.im, oswald.buddenhagen@gmx.de, code@khaugsbakk.name
In-Reply-To: <cf52154c-1275-4a4b-957e-5aa17f22705c@kdbg.org>

Johannes Sixt <j6t@kdbg.org> writes:

> Am 21.05.26 um 11:59 schrieb Siddh Raman Pant:
>> The timeout is for the failure path, where the external helper has
>> already stopped following that protocol or is blocked on something
>> outside git's control. Since git starts the helper and puts it on the
>> log/grep path, git also needs a bounded way to recover when that helper
>> does not make progress. Otherwise an optional note source can prevent
>> the main git command from completing.
>
> That Git communicates with a process that looks like it stopped is the
> normal case, for example:
>
> - Output is sent to the pager. The user can take their time to study the
> output. All the while, git waits patiently for the user to advance the
> pager.
>
> - Git fetch transfers large amounts of data across the network. Most of
> the time it waits for data to arrive and does nothing. The peer process
> looks like it hangs. Git does not decide to kill the connection at any
> time. It is the user's decision to do so.
>
> If the notes provider hangs, then it is not on Git to decide when it has
> waited long enough.

It is often the sticking sore point that there is no good timeout
value that suites for everybody.

If a protocol builds its own way to declare "this backend is slow,
so please do not consider less than 3 seconds of nonaction something
to worry about but kill it off if you waited more than that" to make
the receiving/waiting end responsible for managing timeout, that
might be workable, but it certainly feels like a kludge.  The
protocol can instead allow an "error - for your particular request,
we couldn't come up with an answer within a reasonable time limit"
response (in practice, "within time limit" does not have to be the
only reason for such an error) to be returned, I think.

^ permalink raw reply

* Re: [PATCH 0/8] setup: centralize object database creation
From: Junio C Hamano @ 2026-05-22  0:32 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git
In-Reply-To: <20260521-b4-pks-setup-centralize-odb-creation-v1-0-f130d2a7e8ae@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> The series is built on top of aec3f58750 (Sync with 'maint', 2026-05-21)
> with ps/setup-wo-the-repository at df69f40c34 (setup: stop using
> `the_repository` in `init_db()`, 2026-05-19) merged into it.

FWIW, this merge needs the following merge-fix squashed into it,
for the topic to build standalone.

commit ce350f62ceb26f3276ea3b7ad78b7f8cb4c35cf7
Author: Junio C Hamano <gitster@pobox.com>
Date:   Wed May 13 12:20:29 2026 +0900

    merge-fix/ps/setup-wo-the-repository
    
    with  js/objects-larger-than-4gb-on-windows

diff --git a/t/helper/test-synthesize.c b/t/helper/test-synthesize.c
index 1f28ecf0f2..3fa534fbdf 100644
--- a/t/helper/test-synthesize.c
+++ b/t/helper/test-synthesize.c
@@ -506,7 +506,7 @@ static int cmd__synthesize__pack(int argc, const char **argv,
 		OPT_END()
 	};
 
-	setup_git_directory_gently(&non_git);
+	setup_git_directory_gently(the_repository, &non_git);
 	repo = the_repository;
 	algo = unsafe_hash_algo(repo->hash_algo);
 

^ permalink raw reply related

* Re: [PATCH v10 2/4] branch: add --prune-merged <branch>
From: Junio C Hamano @ 2026-05-22  1:17 UTC (permalink / raw)
  To: Harald Nordgren via GitGitGadget
  Cc: git, Kristoffer Haugsbakk, Johannes Sixt, Phillip Wood,
	Harald Nordgren
In-Reply-To: <718e28c7e0120a826385189213cccec1f0fce1af.1779403204.git.gitgitgadget@gmail.com>

"Harald Nordgren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +`--prune-merged`::
> +	Delete the local branches that `--forked` would list for
> +	the same _<branch>_ arguments, but only those whose tip is
> +	reachable from their configured upstream.
> ++
> +For arguments that refer to remote-tracking branches, run
> +`git fetch` first ...

Please don't.  

"git branch -d <derived>" checks if <derived> has already been
merged to its <upstream> as your repository currently sees it, and
this makes "--prune-merged" inconsistent.  Before deciding to use
"--prune-merged", the user may have sanity checked if it is safe to
remove by running "git branch -no-merged", whose answer hence user's
sanity checking will be invalidated by your auto-update from the
upstream.

It also means that you cannot prune already merged ones while you
are not online.

^ permalink raw reply

* Re: [PATCH v10 1/4] branch: add --forked <branch>
From: Junio C Hamano @ 2026-05-22  1:52 UTC (permalink / raw)
  To: Harald Nordgren via GitGitGadget
  Cc: git, Kristoffer Haugsbakk, Johannes Sixt, Phillip Wood,
	Harald Nordgren
In-Reply-To: <f2df15983067ce39b6c33ab81115863d5c3567f4.1779403204.git.gitgitgadget@gmail.com>

"Harald Nordgren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/builtin/branch.c b/builtin/branch.c
> index 1572a4f9ef..1e24c95a69 100644
> --- a/builtin/branch.c
> +++ b/builtin/branch.c
> @@ -28,6 +28,7 @@
>  #include "help.h"
>  #include "advice.h"
>  #include "commit-reach.h"
> +#include "wildmatch.h"
>  
>  static const char * const builtin_branch_usage[] = {
>  	N_("git branch [<options>] [-r | -a] [--merged] [--no-merged]"),
> @@ -38,6 +39,7 @@ static const char * const builtin_branch_usage[] = {
>  	N_("git branch [<options>] (-c | -C) [<old-branch>] <new-branch>"),
>  	N_("git branch [<options>] [-r | -a] [--points-at]"),
>  	N_("git branch [<options>] [-r | -a] [--format]"),
> +	N_("git branch [<options>] --forked <branch>..."),
>  	NULL
>  };
>  
> @@ -191,7 +193,8 @@ static int branch_merged(int kind, const char *name,
>  
>  static int check_branch_commit(const char *branchname, const char *refname,
>  			       const struct object_id *oid, struct commit *head_rev,
> -			       int kinds, int force)
> +			       int kinds, int force, int warn_only,
> +			       int *n_not_merged)
>  {
>  	struct commit *rev = lookup_commit_reference(the_repository, oid);
>  	if (!force && !rev) {
> @@ -199,10 +202,18 @@ static int check_branch_commit(const char *branchname, const char *refname,
>  		return -1;
>  	}
>  	if (!force && !branch_merged(kinds, branchname, rev, head_rev)) {
> -		error(_("the branch '%s' is not fully merged"), branchname);
> -		advise_if_enabled(ADVICE_FORCE_DELETE_BRANCH,
> -				  _("If you are sure you want to delete it, "
> -				  "run 'git branch -D %s'"), branchname);
> +		if (warn_only) {
> +			warning(_("the branch '%s' is not fully merged"),
> +				branchname);
> +		} else {
> +			error(_("the branch '%s' is not fully merged"),
> +			      branchname);
> +			advise_if_enabled(ADVICE_FORCE_DELETE_BRANCH,
> +					  _("If you are sure you want to delete it, "
> +					  "run 'git branch -D %s'"), branchname);
> +		}
> +		if (n_not_merged)
> +			(*n_not_merged)++;
>  		return -1;
>  	}
>  	return 0;

This function originally was to see if "git branch -d <derived>" is
allowed to remove it by calling branch_merged(), which uses the
upstream association of <derived>, falling back to head_rev.  For
our purpose of "--forked" check, we are not deleting, so existing
calls to error() and the phrasing used in its messages that talks
about deletion are not appropriate.  Hence we introduce the
warn_only mode to just state the fact the branch is not fully merged
to its upstream.

It is inherited from the original code, and it is outside the scope
of this series, but after the dust settles, we may want to consider
saying what other branch this branch is expected to be merged to
(i.e. "not fully merged to %s").

It is unclear how n_not_merged would be useful.  Wouldn't the caller
be capable to count the failure returns from this function?

> @@ -218,7 +229,7 @@ static void delete_branch_config(const char *branchname)
>  }
>  
>  static int delete_branches(int argc, const char **argv, int force, int kinds,
> -			   int quiet)
> +			   int quiet, int warn_only, int *n_not_merged)

It is strange to see any need to touch delete_branches() when the
only thing you are adding is the "--forked" option.  For that
matter, the change to check_branch_commit() is also suspect.

In short, this does a lot more than necessary to add "--forked".
Why such a split in the series?

> +static void parse_forked_args(int argc, const char **argv,
> +			      struct string_list *upstream_patterns)
> +{
> +	int i;
> +
> +	for (i = 0; i < argc; i++) {
> +		const char *arg = argv[i];
> +		struct object_id oid;
> +		char *full_ref = NULL;
> +		const char *short_ref;
> +
> +		if (has_glob_specials(arg)) {
> +			string_list_insert(upstream_patterns, arg);
> +			continue;
> +		}
> +
> +		if (repo_dwim_ref(the_repository, arg, strlen(arg), &oid,
> +				  &full_ref, 0) == 1 &&
> +		    (skip_prefix(full_ref, "refs/heads/", &short_ref) ||
> +		     skip_prefix(full_ref, "refs/remotes/", &short_ref))) {
> +			string_list_insert(upstream_patterns, short_ref);
> +			free(full_ref);
> +			continue;
> +		}
> +		free(full_ref);
> +
> +		die(_("'%s' is not a valid branch or pattern"), arg);
> +	}
> +}

This one does look like very much relevant to "--forked".

> +struct forked_cb {
> +	const struct string_list *upstream_patterns;
> +	struct string_list *out;
> +};

So does this.

> +static int collect_forked_branch(const struct reference *ref, void *cb_data)
> +{
> +	struct forked_cb *cb = cb_data;
> +	struct branch *branch;
> +	const char *upstream, *short_upstream;
> +	const struct string_list_item *item;
> +
> +	if (ref->flags & REF_ISSYMREF)
> +		return 0;
> +	branch = branch_get(ref->name);
> +	if (!branch)
> +		return 0;
> +	upstream = branch_get_upstream(branch, NULL);
> +	if (!upstream)
> +		return 0;
> +	short_upstream = upstream;
> +	(void)(skip_prefix(short_upstream, "refs/heads/", &short_upstream) ||
> +	       skip_prefix(short_upstream, "refs/remotes/", &short_upstream));
> +
> +	for_each_string_list_item(item, cb->upstream_patterns)

Sholdn't each element of upstream_patterns remember if it is
suitable for wildmatch or not when it gets parsed in the earlier
function we saw, so that this loop can refrain from calling
wildmatch() for non-wildcard elements?

> +		if (!wildmatch(item->string, short_upstream, WM_PATHNAME)) {
> +			string_list_append(cb->out, ref->name)->util =
> +				xstrdup(upstream);
> +			return 0;
> +		}
> +	return 0;
> +}
> +
> +static void collect_forked_set(int argc, const char **argv,
> +			       struct string_list *out)
> +{
> +	struct string_list upstream_patterns = STRING_LIST_INIT_DUP;
> +	struct forked_cb cb = {
> +		.upstream_patterns = &upstream_patterns,
> +		.out = out,
> +	};
> +
> +	parse_forked_args(argc, argv, &upstream_patterns);
> +
> +	refs_for_each_branch_ref(get_main_ref_store(the_repository),
> +				 collect_forked_branch, &cb);
> +
> +	string_list_clear(&upstream_patterns, 0);
> +}
> +
> +static int list_forked_branches(int argc, const char **argv)
> +{
> +	struct string_list out = STRING_LIST_INIT_DUP;
> +	struct string_list_item *item;
> +
> +	if (!argc)
> +		die(_("--forked requires at least one <branch>"));
> +
> +	collect_forked_set(argc, argv, &out);
> +	for_each_string_list_item(item, &out)
> +		puts(item->string);
> +
> +	string_list_clear(&out, 1);
> +	return 0;
> +}

Up to point the changes are very much appropriate for adding the
"--forked" option.

> @@ -714,6 +822,7 @@ int cmd_branch(int argc,
>  	/* possible actions */
>  	int delete = 0, rename = 0, copy = 0, list = 0,
>  	    unset_upstream = 0, show_current = 0, edit_description = 0;
> +	int forked = 0;
>  	const char *new_upstream = NULL;
>  	int noncreate_actions = 0;
>  	/* possible options */
> @@ -767,6 +876,8 @@ int cmd_branch(int argc,
>  		OPT_BOOL(0, "create-reflog", &reflog, N_("create the branch's reflog")),
>  		OPT_BOOL(0, "edit-description", &edit_description,
>  			 N_("edit the description for the branch")),
> +		OPT_BOOL(0, "forked", &forked,
> +			N_("list local branches whose upstream matches the given <branch>...")),
>  		OPT__FORCE(&force, N_("force creation, move/rename, deletion"), PARSE_OPT_NOCOMPLETE),
>  		OPT_MERGED(&filter, N_("print only branches that are merged")),
>  		OPT_NO_MERGED(&filter, N_("print only branches that are not merged")),
> @@ -811,7 +922,7 @@ int cmd_branch(int argc,
>  			     0);
>  
>  	if (!delete && !rename && !copy && !edit_description && !new_upstream &&
> -	    !show_current && !unset_upstream && argc == 0)
> +	    !show_current && !unset_upstream && !forked && argc == 0)
>  		list = 1;
>  
>  	if (filter.with_commit || filter.no_commit ||
> @@ -820,7 +931,7 @@ int cmd_branch(int argc,
>  
>  	noncreate_actions = !!delete + !!rename + !!copy + !!new_upstream +
>  			    !!show_current + !!list + !!edit_description +
> -			    !!unset_upstream;
> +			    !!unset_upstream + !!forked;
>  	if (noncreate_actions > 1)
>  		usage_with_options(builtin_branch_usage, options);
>  
> @@ -858,7 +969,11 @@ int cmd_branch(int argc,
>  	if (delete) {
>  		if (!argc)
>  			die(_("branch name required"));
> -		ret = delete_branches(argc, argv, delete > 1, filter.kind, quiet);
> +		ret = delete_branches(argc, argv, delete > 1, filter.kind,
> +				      quiet, 0, NULL);
> +		goto out;

This does not belong to "--forked" as far as I can tell.

> +	} else if (forked) {
> +		ret = list_forked_branches(argc, argv);
>  		goto out;
>  	} else if (show_current) {
>  		print_current_branch_name();

^ permalink raw reply

* [PATCH 0/5] [RFC] diff: add diff.<driver>.process for external hunk providers
From: Michael Montalbo via GitGitGadget @ 2026-05-22  2:11 UTC (permalink / raw)
  To: git; +Cc: Michael Montalbo

This series adds diff.<driver>.process, a long-running subprocess protocol
that lets external tools provide hunks to git's diff and blame pipelines.

Over the past 18 years, git's diff pipeline accumulated many features that
operate on hunks: word diff, function context, color-moved, indent
heuristic, blame. External tools can replace the pipeline entirely
(diff.<driver>.command) or select among builtin algorithms
(diff.<driver>.algorithm), but there is no way for a tool to provide
line-change information into the pipeline. Tools that understand code
structure (tree-sitter parsers, format-aware analyzers, tools like
Difftastic and Mergiraf) must bypass git's pipeline and lose access to
everything downstream.

The protocol follows filter.<driver>.process: pkt-line over stdin/stdout,
capability negotiation, one tool invocation per git command. The tool
receives file pairs and returns hunk descriptors that git feeds into the
standard xdiff pipeline. All output features work normally.

Zero hunks with status=success means the tool considers the files
equivalent. git diff shows no output for the file, and git blame skips the
commit, attributing lines to earlier commits.

On error or tool crash, git falls back silently to the builtin diff
algorithm. The feature is opt-in via diff.<driver>.process and
.gitattributes; unconfigured files are unaffected.

The series includes git diff-process-normalize, a built-in tool that
compares files line by line ignoring whitespace (same logic as "git diff -w"
via xdiff_compare_lines):

[diff "cdiff"]
    process = git diff-process-normalize

A whitespace-only boolean flag could serve this specific case. The
subprocess protocol is more general, allowing any tool to participate
without further changes to git.

Michael Montalbo (5):
  xdiff: support external hunks via xpparam_t
  userdiff: add diff.<driver>.process config
  diff: add long-running diff process via diff.<driver>.process
  blame: consult diff process for zero-hunk detection
  diff-process-normalize: add built-in whitespace normalizer

 Documentation/config/diff.adoc   |  10 +
 Documentation/gitattributes.adoc |  58 +++++
 Makefile                         |   2 +
 blame.c                          |  43 +++-
 builtin.h                        |   1 +
 builtin/diff-process-normalize.c | 143 ++++++++++
 diff-process.c                   | 203 +++++++++++++++
 diff-process.h                   |  28 ++
 diff.c                           |  25 ++
 git.c                            |   1 +
 t/t4080-diff-process.sh          | 430 +++++++++++++++++++++++++++++++
 userdiff.c                       |   7 +
 userdiff.h                       |   2 +
 xdiff-interface.c                |   7 +-
 xdiff/xdiff.h                    |  13 +
 xdiff/xdiffi.c                   |  98 ++++++-
 16 files changed, 1063 insertions(+), 8 deletions(-)
 create mode 100644 builtin/diff-process-normalize.c
 create mode 100644 diff-process.c
 create mode 100644 diff-process.h
 create mode 100755 t/t4080-diff-process.sh

base-commit: 94f057755b7941b321fd11fec1b2e3ca5313a4e0
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2120%2Fmmontalbo%2Fmm%2Fstructural-diff-backend-clean-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2120/mmontalbo/mm/structural-diff-backend-clean-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/2120
-- 
gitgitgadget

^ permalink raw reply

* [PATCH 1/5] xdiff: support external hunks via xpparam_t
From: Michael Montalbo via GitGitGadget @ 2026-05-22  2:11 UTC (permalink / raw)
  To: git; +Cc: Michael Montalbo, Michael Montalbo
In-Reply-To: <pull.2120.git.1779415884.gitgitgadget@gmail.com>

From: Michael Montalbo <mmontalbo@gmail.com>

Add two new xpparam_t fields (external_hunks, external_hunks_nr)
that let callers supply pre-computed hunks.  When set, xdl_diff()
populates the changed[] arrays from these hunks instead of running
the diff algorithm, then continues through compaction and emission
as usual.

Validate supplied hunks before use: reject out-of-bounds line
numbers, overlapping or out-of-order hunks, negative counts, and
violations of the synchronization invariant (unchanged line counts
must match between files).  On validation failure, fall back to
the builtin diff algorithm.

Skip trim_common_tail() in xdi_diff() when external hunks are
present, since external hunks reference line numbers in the
original content.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
---
 xdiff-interface.c |  7 +++-
 xdiff/xdiff.h     | 13 +++++++
 xdiff/xdiffi.c    | 98 +++++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 114 insertions(+), 4 deletions(-)

diff --git a/xdiff-interface.c b/xdiff-interface.c
index f043330f2a..9542c0bcc2 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -124,7 +124,12 @@ int xdi_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, xdemitconf_t co
 	if (mf1->size > MAX_XDIFF_SIZE || mf2->size > MAX_XDIFF_SIZE)
 		return -1;
 
-	if (!xecfg->ctxlen && !(xecfg->flags & XDL_EMIT_FUNCCONTEXT))
+	/*
+	 * External hunks reference line numbers in the original content;
+	 * trimming the tail would change line counts and invalidate them.
+	 */
+	if (!xpp->external_hunks &&
+	    !xecfg->ctxlen && !(xecfg->flags & XDL_EMIT_FUNCCONTEXT))
 		trim_common_tail(&a, &b);
 
 	return xdl_diff(&a, &b, xpp, xecfg, xecb);
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index dc370712e9..2ee6f1aae3 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -78,6 +78,15 @@ typedef struct s_mmbuffer {
 	long size;
 } mmbuffer_t;
 
+/*
+ * Hunk descriptor for externally computed diffs.
+ * Line numbers are 1-based, matching unified diff convention.
+ */
+struct xdl_hunk {
+	long old_start, old_count;
+	long new_start, new_count;
+};
+
 typedef struct s_xpparam {
 	unsigned long flags;
 
@@ -88,6 +97,10 @@ typedef struct s_xpparam {
 	/* See Documentation/diff-options.adoc. */
 	char **anchors;
 	size_t anchors_nr;
+
+	/* Externally computed hunks: bypass the diff algorithm. */
+	const struct xdl_hunk *external_hunks;
+	size_t external_hunks_nr;
 } xpparam_t;
 
 typedef struct s_xdemitcb {
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 5455b4690d..7eca4ab4a1 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -1085,16 +1085,108 @@ static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe,
 	}
 }
 
+/*
+ * Populate the changed[] arrays from externally supplied hunks,
+ * bypassing the diff algorithm.  Validates that hunks are in order,
+ * non-overlapping, and within bounds.
+ *
+ * Returns 0 on success, -1 on validation failure.
+ */
+static int xdl_populate_hunks_from_external(xdfenv_t *xe,
+					    const struct xdl_hunk *hunks,
+					    size_t nr_hunks)
+{
+	size_t i;
+	long j, prev_old_end = 0, prev_new_end = 0;
+	long total_old = 0, total_new = 0;
+
+	/*
+	 * Clear changed[] arrays.  xdl_prepare_env() may have dirtied
+	 * them via xdl_cleanup_records().  The allocation is nrec + 2
+	 * elements; changed points one past the start (see xprepare.c).
+	 */
+	memset(xe->xdf1.changed - 1, 0,
+	       (xe->xdf1.nrec + 2) * sizeof(bool));
+	memset(xe->xdf2.changed - 1, 0,
+	       (xe->xdf2.nrec + 2) * sizeof(bool));
+
+	for (i = 0; i < nr_hunks; i++) {
+		const struct xdl_hunk *h = &hunks[i];
+
+		if (h->old_count < 0 || h->new_count < 0)
+			return -1;
+
+		/* Bounds check (1-based line numbers) */
+		if (h->old_count > 0 &&
+		    (h->old_start < 1 ||
+		     h->old_start + h->old_count - 1 > xe->xdf1.nrec))
+			return -1;
+		if (h->new_count > 0 &&
+		    (h->new_start < 1 ||
+		     h->new_start + h->new_count - 1 > xe->xdf2.nrec))
+			return -1;
+
+		/* Zero-count hunks: start must still be in [1, nrec+1] */
+		if (h->old_count == 0 &&
+		    (h->old_start < 1 || h->old_start > xe->xdf1.nrec + 1))
+			return -1;
+		if (h->new_count == 0 &&
+		    (h->new_start < 1 || h->new_start > xe->xdf2.nrec + 1))
+			return -1;
+
+		/* Ordering: no overlap with previous hunk */
+		if (h->old_start < prev_old_end ||
+		    h->new_start < prev_new_end)
+			return -1;
+
+		for (j = 0; j < h->old_count; j++)
+			xe->xdf1.changed[h->old_start - 1 + j] = true;
+		for (j = 0; j < h->new_count; j++)
+			xe->xdf2.changed[h->new_start - 1 + j] = true;
+
+		prev_old_end = h->old_start + h->old_count;
+		prev_new_end = h->new_start + h->new_count;
+		total_old += h->old_count;
+		total_new += h->new_count;
+	}
+
+	/*
+	 * Synchronization invariant: unchanged line counts must match.
+	 * Otherwise xdl_build_script() would walk off one array.
+	 */
+	if ((long)xe->xdf1.nrec - total_old !=
+	    (long)xe->xdf2.nrec - total_new)
+		return -1;
+
+	return 0;
+}
+
 int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 	     xdemitconf_t const *xecfg, xdemitcb_t *ecb) {
 	xdchange_t *xscr;
 	xdfenv_t xe;
 	emit_func_t ef = xecfg->hunk_func ? xdl_call_hunk_func : xdl_emit_diff;
 
-	if (xdl_do_diff(mf1, mf2, xpp, &xe) < 0) {
-
-		return -1;
+	if (xpp->external_hunks) {
+		if (xdl_prepare_env(mf1, mf2, xpp, &xe) < 0)
+			return -1;
+		if (xdl_populate_hunks_from_external(&xe,
+						     xpp->external_hunks,
+						     xpp->external_hunks_nr) < 0) {
+			/*
+			 * Invalid external hunks; fall back to the
+			 * builtin diff algorithm.  Re-runs
+			 * xdl_prepare_env() via xdl_do_diff().
+			 */
+			xdl_free_env(&xe);
+			if (xdl_do_diff(mf1, mf2, xpp, &xe) < 0)
+				return -1;
+		}
+	} else {
+		if (xdl_do_diff(mf1, mf2, xpp, &xe) < 0)
+			return -1;
 	}
+
 	if (xdl_change_compact(&xe.xdf1, &xe.xdf2, xpp->flags) < 0 ||
 	    xdl_change_compact(&xe.xdf2, &xe.xdf1, xpp->flags) < 0 ||
 	    xdl_build_script(&xe, &xscr) < 0) {
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH 2/5] userdiff: add diff.<driver>.process config
From: Michael Montalbo via GitGitGadget @ 2026-05-22  2:11 UTC (permalink / raw)
  To: git; +Cc: Michael Montalbo, Michael Montalbo
In-Reply-To: <pull.2120.git.1779415884.gitgitgadget@gmail.com>

From: Michael Montalbo <mmontalbo@gmail.com>

Add a new per-driver configuration key that specifies the command
for a long-running diff process.

The name follows filter.<driver>.process: a long-running subprocess
that stays alive across files within a single git invocation.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
---
 userdiff.c | 7 +++++++
 userdiff.h | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/userdiff.c b/userdiff.c
index fe710a68bf..81c0bebcce 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -499,6 +499,13 @@ int userdiff_config(const char *k, const char *v)
 		drv->algorithm = drv->algorithm_owned;
 		return ret;
 	}
+	if (!strcmp(type, "process")) {
+		int ret;
+		FREE_AND_NULL(drv->process_owned);
+		ret = git_config_string(&drv->process_owned, k, v);
+		drv->process = drv->process_owned;
+		return ret;
+	}
 
 	return 0;
 }
diff --git a/userdiff.h b/userdiff.h
index 827361b0bc..51c26e0d41 100644
--- a/userdiff.h
+++ b/userdiff.h
@@ -31,6 +31,8 @@ struct userdiff_driver {
 	char *textconv_owned;
 	struct notes_cache *textconv_cache;
 	int textconv_want_cache;
+	const char *process;
+	char *process_owned;
 };
 enum userdiff_driver_type {
 	USERDIFF_DRIVER_TYPE_BUILTIN = 1<<0,
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH 3/5] diff: add long-running diff process via diff.<driver>.process
From: Michael Montalbo via GitGitGadget @ 2026-05-22  2:11 UTC (permalink / raw)
  To: git; +Cc: Michael Montalbo, Michael Montalbo
In-Reply-To: <pull.2120.git.1779415884.gitgitgadget@gmail.com>

From: Michael Montalbo <mmontalbo@gmail.com>

Add support for external diff processes that communicate via the
long-running process protocol (pkt-line over stdin/stdout).

A diff process is configured per userdiff driver:

    [diff "cdiff"]
        process = /path/to/diff-tool

The tool receives file pairs and returns hunks describing which
lines changed.  Git feeds these hunks into the standard xdiff
pipeline, so all output features (word diff, function context,
color) work normally.

The handshake negotiates version=1 and capability=hunks.  Per-file
requests send command=hunks, pathname, and both file contents as
packetized data.  The tool responds with hunk lines and a status
packet.  On error, git falls back to the builtin diff algorithm.

Zero hunks with status=success means the tool considers the files
equivalent.  Git skips diff output for that file entirely.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
---
 Documentation/config/diff.adoc   |   8 +
 Documentation/gitattributes.adoc |  40 ++++
 Makefile                         |   1 +
 diff-process.c                   | 203 +++++++++++++++++++
 diff-process.h                   |  28 +++
 diff.c                           |  25 +++
 t/t4080-diff-process.sh          | 338 +++++++++++++++++++++++++++++++
 7 files changed, 643 insertions(+)
 create mode 100644 diff-process.c
 create mode 100644 diff-process.h
 create mode 100755 t/t4080-diff-process.sh

diff --git a/Documentation/config/diff.adoc b/Documentation/config/diff.adoc
index 1135a62a0a..4ab5f60df6 100644
--- a/Documentation/config/diff.adoc
+++ b/Documentation/config/diff.adoc
@@ -218,6 +218,14 @@ endif::git-diff[]
 	Set this option to `true` to make the diff driver cache the text
 	conversion outputs.  See linkgit:gitattributes[5] for details.
 
+`diff.<driver>.process`::
+	The command to run as a long-running diff process.
+	The tool communicates via the pkt-line protocol and returns
+	hunks that are fed into Git's diff and blame pipelines.
+	If the tool returns zero hunks, the file is treated as
+	unchanged for both diff output and blame attribution.
+	See linkgit:gitattributes[5] for details.
+
 `diff.indentHeuristic`::
 	Set this option to `false` to disable the default heuristics
 	that shift diff hunk boundaries to make patches easier to read.
diff --git a/Documentation/gitattributes.adoc b/Documentation/gitattributes.adoc
index f20041a323..cc724f8c63 100644
--- a/Documentation/gitattributes.adoc
+++ b/Documentation/gitattributes.adoc
@@ -821,6 +821,46 @@ NOTE: If `diff.<name>.command` is defined for path with the
 (see above), and adding `diff.<name>.algorithm` has no effect, as the
 algorithm is not passed to the external diff driver.
 
+Using an external diff process
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+An external tool can provide content-aware line matching by
+setting `diff.<name>.process` to the command that runs
+the tool.  The tool is a long-running process that communicates via
+the pkt-line protocol (see
+linkgit:gitprotocol-long-running-process[5]).
+
+------------------------
+*.c diff=cdiff
+------------------------
+
+----------------------------------------------------------------
+[diff "cdiff"]
+  process = /path/to/diff-process-tool
+----------------------------------------------------------------
+
+The tool receives file pairs and returns hunk descriptors indicating
+which lines changed.  Git feeds these hunks into its standard diff
+pipeline, so all output features (word diff, function context,
+color) work normally.
+
+If the tool fails or returns an error, Git silently falls back to
+the builtin diff algorithm.  If the tool returns invalid hunks
+(out of bounds, overlapping), Git also falls back silently.
+
+The handshake negotiates `version=1` and `capability=hunks`.
+Per-file requests send `command=hunks` and `pathname=<path>`,
+followed by the old and new file content as packetized data.
+The tool responds with lines of the form
+`hunk <old_start> <old_count> <new_start> <new_count>`
+(1-based line numbers), a flush packet, and `status=success`.
+
+If the tool returns zero hunks with `status=success`, Git treats
+the file as having no changes and produces no diff output.
+
+Tools should ignore unknown keys in the per-file request to
+remain forward-compatible.
+
 Defining a custom hunk-header
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
diff --git a/Makefile b/Makefile
index cedc234173..22900368dd 100644
--- a/Makefile
+++ b/Makefile
@@ -1142,6 +1142,7 @@ LIB_OBJS += diff-delta.o
 LIB_OBJS += diff-merges.o
 LIB_OBJS += diff-lib.o
 LIB_OBJS += diff-no-index.o
+LIB_OBJS += diff-process.o
 LIB_OBJS += diff.o
 LIB_OBJS += diffcore-break.o
 LIB_OBJS += diffcore-delta.o
diff --git a/diff-process.c b/diff-process.c
new file mode 100644
index 0000000000..7b0f0e1f7e
--- /dev/null
+++ b/diff-process.c
@@ -0,0 +1,203 @@
+/*
+ * Diff process backend: communicates with a long-running external
+ * tool via the pkt-line protocol to obtain content-aware hunks.
+ *
+ * Protocol: pkt-line over stdin/stdout, following the pattern of
+ * the long-running filter process protocol (see convert.c).
+ *
+ * Handshake:
+ *   git> git-diff-client / version=1 / flush
+ *   tool< git-diff-server / version=1 / flush
+ *   git> capability=hunks / flush
+ *   tool< capability=hunks / flush
+ *
+ * Per-file:
+ *   git> command=hunks / pathname=<path> / flush
+ *   git> <old content packetized> / flush
+ *   git> <new content packetized> / flush
+ *   tool< hunk <old_start> <old_count> <new_start> <new_count>
+ *   tool< ... / flush
+ *   tool< status=success / flush
+ *
+ * Zero hunks with status=success means the tool considers the
+ * files equivalent.  Git will skip the diff for that file.
+ */
+
+#include "git-compat-util.h"
+#include "diff-process.h"
+#include "userdiff.h"
+#include "sub-process.h"
+#include "pkt-line.h"
+#include "strbuf.h"
+#include "xdiff/xdiff.h"
+
+#define CAP_HUNKS (1u << 0)
+
+struct diff_subprocess {
+	struct subprocess_entry subprocess;
+	unsigned int supported_capabilities;
+};
+
+static int subprocess_map_initialized;
+static struct hashmap subprocess_map;
+
+static int start_diff_process_fn(struct subprocess_entry *subprocess)
+{
+	static int versions[] = { 1, 0 };
+	static struct subprocess_capability capabilities[] = {
+		{ "hunks", CAP_HUNKS },
+		{ NULL, 0 }
+	};
+	struct diff_subprocess *entry =
+		(struct diff_subprocess *)subprocess;
+
+	/* Uses dying pkt-line variant, same as convert.c filters. */
+	return subprocess_handshake(subprocess, "git-diff",
+				    versions, NULL,
+				    capabilities,
+				    &entry->supported_capabilities);
+}
+
+static struct diff_subprocess *find_or_start_process(const char *cmd)
+{
+	struct diff_subprocess *entry;
+
+	if (!subprocess_map_initialized) {
+		subprocess_map_initialized = 1;
+		hashmap_init(&subprocess_map, cmd2process_cmp, NULL, 0);
+	}
+
+	entry = (struct diff_subprocess *)
+		subprocess_find_entry(&subprocess_map, cmd);
+	if (entry)
+		return entry;
+
+	entry = xcalloc(1, sizeof(*entry));
+	if (subprocess_start(&subprocess_map, &entry->subprocess,
+			     cmd, start_diff_process_fn)) {
+		free(entry);
+		return NULL;
+	}
+
+	return entry;
+}
+
+static int send_file_content(int fd, const char *buf, long size)
+{
+	int ret;
+
+	if (size > 0)
+		ret = write_packetized_from_buf_no_flush(buf, size, fd);
+	else
+		ret = 0;
+	if (ret)
+		return ret;
+	return packet_flush_gently(fd);
+}
+
+static int parse_hunk_line(const char *line, struct xdl_hunk *hunk)
+{
+	char *end;
+
+	/* Format: "hunk <old_start> <old_count> <new_start> <new_count>" */
+	if (!skip_prefix(line, "hunk ", &line))
+		return -1;
+
+	hunk->old_start = strtol(line, &end, 10);
+	if (end == line || *end != ' ')
+		return -1;
+	line = end;
+
+	hunk->old_count = strtol(line, &end, 10);
+	if (end == line || *end != ' ')
+		return -1;
+	line = end;
+
+	hunk->new_start = strtol(line, &end, 10);
+	if (end == line || *end != ' ')
+		return -1;
+	line = end;
+
+	hunk->new_count = strtol(line, &end, 10);
+	if (end == line || *end != '\0')
+		return -1;
+
+	return 0;
+}
+
+int diff_process_get_hunks(struct userdiff_driver *drv,
+			   const char *path,
+			   const char *old_buf, long old_size,
+			   const char *new_buf, long new_size,
+			   struct xdl_hunk **hunks_out,
+			   size_t *nr_hunks_out)
+{
+	struct diff_subprocess *backend;
+	struct child_process *process;
+	int fd_in, fd_out;
+	struct strbuf status = STRBUF_INIT;
+	struct xdl_hunk *hunks = NULL;
+	struct xdl_hunk hunk;
+	size_t nr_hunks = 0, alloc_hunks = 0;
+	int len;
+	char *line;
+
+	if (!drv || !drv->process)
+		return -1;
+
+	backend = find_or_start_process(drv->process);
+	if (!backend)
+		return -1;
+
+	if (!(backend->supported_capabilities & CAP_HUNKS))
+		return -1;
+
+	process = subprocess_get_child_process(&backend->subprocess);
+	fd_in = process->in;
+	fd_out = process->out;
+
+	/* Send request */
+	if (packet_write_fmt_gently(fd_in, "command=hunks\n") ||
+	    packet_write_fmt_gently(fd_in, "pathname=%s\n", path) ||
+	    packet_flush_gently(fd_in))
+		goto error;
+
+	/* Send old file content */
+	if (send_file_content(fd_in, old_buf, old_size))
+		goto error;
+
+	/* Send new file content */
+	if (send_file_content(fd_in, new_buf, new_size))
+		goto error;
+
+	/* Read hunks until flush packet */
+	while ((len = packet_read_line_gently(fd_out, NULL, &line)) >= 0 &&
+	       line) {
+		if (parse_hunk_line(line, &hunk) < 0)
+			goto error;
+		ALLOC_GROW(hunks, nr_hunks + 1, alloc_hunks);
+		hunks[nr_hunks++] = hunk;
+	}
+	if (len < 0)
+		goto error;
+
+	/* Read status */
+	if (subprocess_read_status(fd_out, &status))
+		goto error;
+
+	if (strcmp(status.buf, "success")) {
+		if (!strcmp(status.buf, "abort"))
+			backend->supported_capabilities &= ~CAP_HUNKS;
+		goto error;
+	}
+
+	*hunks_out = hunks;
+	*nr_hunks_out = nr_hunks;
+	strbuf_release(&status);
+	return 0;
+
+error:
+	free(hunks);
+	strbuf_release(&status);
+	return -1;
+}
diff --git a/diff-process.h b/diff-process.h
new file mode 100644
index 0000000000..4c84951e02
--- /dev/null
+++ b/diff-process.h
@@ -0,0 +1,28 @@
+#ifndef DIFF_PROCESS_H
+#define DIFF_PROCESS_H
+
+struct userdiff_driver;
+struct xdl_hunk;
+
+/*
+ * Query a diff process for hunks describing the changes
+ * between old_buf and new_buf.
+ *
+ * The backend is a long-running subprocess configured via
+ * diff.<driver>.process.  It receives file content via
+ * pkt-line and returns hunks with 1-based line numbers.
+ *
+ * On success, sets *hunks_out and *nr_hunks_out to a newly allocated
+ * array (caller must free) and returns 0.
+ *
+ * On failure, returns -1.  The caller should fall back to the
+ * builtin diff algorithm.
+ */
+int diff_process_get_hunks(struct userdiff_driver *drv,
+			   const char *path,
+			   const char *old_buf, long old_size,
+			   const char *new_buf, long new_size,
+			   struct xdl_hunk **hunks_out,
+			   size_t *nr_hunks_out);
+
+#endif /* DIFF_PROCESS_H */
diff --git a/diff.c b/diff.c
index 397e38b41c..c5e7c329b2 100644
--- a/diff.c
+++ b/diff.c
@@ -25,7 +25,9 @@
 #include "utf8.h"
 #include "odb.h"
 #include "userdiff.h"
+#include "diff-process.h"
 #include "submodule.h"
+#include "trace2.h"
 #include "hashmap.h"
 #include "mem-pool.h"
 #include "merge-ll.h"
@@ -3991,6 +3993,7 @@ static void builtin_diff(const char *name_a,
 		xpparam_t xpp;
 		xdemitconf_t xecfg;
 		struct emit_callback ecbdata;
+		struct xdl_hunk *ext_hunks = NULL;
 		unsigned ws_rule;
 		const struct userdiff_funcname *pe;
 
@@ -4031,6 +4034,27 @@ static void builtin_diff(const char *name_a,
 		xpp.ignore_regex_nr = o->ignore_regex_nr;
 		xpp.anchors = o->anchors;
 		xpp.anchors_nr = o->anchors_nr;
+
+		if (!o->ignore_driver_algorithm &&
+		    one->driver && one->driver->process) {
+			size_t ext_hunks_nr = 0;
+			if (!diff_process_get_hunks(
+				    one->driver, name_a,
+				    mf1.ptr, mf1.size,
+				    mf2.ptr, mf2.size,
+				    &ext_hunks, &ext_hunks_nr)) {
+				if (!ext_hunks_nr)
+					goto free_ab_and_return;
+				xpp.external_hunks = ext_hunks;
+				xpp.external_hunks_nr = ext_hunks_nr;
+			} else {
+				trace2_data_string("diff",
+						   o->repo,
+						   "diff-process-fallback",
+						   name_a);
+			}
+		}
+
 		xecfg.ctxlen = o->context;
 		xecfg.interhunkctxlen = o->interhunkcontext;
 		xecfg.flags = XDL_EMIT_FUNCNAMES;
@@ -4111,6 +4135,7 @@ static void builtin_diff(const char *name_a,
 		} else if (xdi_diff_outf(&mf1, &mf2, NULL, fn_out_consume,
 					 &ecbdata, &xpp, &xecfg))
 			die("unable to generate diff for %s", one->path);
+		free(ext_hunks);
 		if (o->word_diff)
 			free_diff_words_data(&ecbdata);
 		if (textconv_one)
diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
new file mode 100755
index 0000000000..6f49f4e66b
--- /dev/null
+++ b/t/t4080-diff-process.sh
@@ -0,0 +1,338 @@
+#!/bin/sh
+
+test_description='diff process via long-running process'
+
+. ./test-lib.sh
+
+if test_have_prereq PYTHON
+then
+	PYTHON_PATH=$(command -v python3) || PYTHON_PATH=$(command -v python)
+fi
+
+#
+# A single parametric diff process.
+# Usage: diff-process-backend --mode=<mode> [--log=<path>]
+#
+# Modes:
+#   whole-file  - report all lines as changed (default)
+#   fixed-hunk  - always report hunk 5 2 5 2
+#   bad-hunk    - report out-of-bounds hunk 999 1 999 1
+#   zero-hunk   - return zero hunks (files considered equivalent)
+#   error       - return status=error for every request
+#   abort       - return status=abort for every request
+#   crash       - read one request then exit without responding
+#
+setup_backend () {
+	cat >"$TRASH_DIRECTORY/diff-process-backend.py" <<-\PYEOF
+	import sys, os
+
+	def read_pkt():
+	    hdr = sys.stdin.buffer.read(4)
+	    if len(hdr) < 4: return None
+	    length = int(hdr, 16)
+	    if length == 0: return ""
+	    data = sys.stdin.buffer.read(length - 4)
+	    return data.decode().rstrip("\n")
+
+	def write_pkt(line):
+	    data = (line + "\n").encode()
+	    sys.stdout.buffer.write(f"{len(data)+4:04x}".encode() + data)
+	    sys.stdout.buffer.flush()
+
+	def write_flush():
+	    sys.stdout.buffer.write(b"0000")
+	    sys.stdout.buffer.flush()
+
+	def read_content():
+	    chunks = []
+	    while True:
+	        hdr = sys.stdin.buffer.read(4)
+	        if len(hdr) < 4: break
+	        length = int(hdr, 16)
+	        if length == 0: break
+	        chunks.append(sys.stdin.buffer.read(length - 4))
+	    return b"".join(chunks)
+
+	mode = "whole-file"
+	logfile = None
+	for arg in sys.argv[1:]:
+	    if arg.startswith("--mode="):
+	        mode = arg[7:]
+	    elif arg.startswith("--log="):
+	        logfile = open(arg[6:], "a")
+
+	def log(msg):
+	    if logfile:
+	        logfile.write(msg + "\n")
+	        logfile.flush()
+
+	# Handshake
+	assert read_pkt() == "git-diff-client"
+	assert read_pkt() == "version=1"
+	read_pkt()
+	write_pkt("git-diff-server")
+	write_pkt("version=1")
+	write_flush()
+	while True:
+	    p = read_pkt()
+	    if p == "": break
+	write_pkt("capability=hunks")
+	write_flush()
+
+	log("ready")
+
+	while True:
+	    cmd = None
+	    pathname = None
+	    while True:
+	        p = read_pkt()
+	        if p is None: sys.exit(0)
+	        if p == "": break
+	        if p.startswith("command="): cmd = p.split("=",1)[1]
+	        if p.startswith("pathname="): pathname = p.split("=",1)[1]
+	    if cmd is None: sys.exit(0)
+	    old = read_content()
+	    new = read_content()
+	    log(f"command={cmd} pathname={pathname}")
+
+	    if mode == "error":
+	        write_flush()
+	        write_pkt("status=error")
+	        write_flush()
+	        continue
+
+	    if mode == "abort":
+	        write_flush()
+	        write_pkt("status=abort")
+	        write_flush()
+	        continue
+
+	    if mode == "crash":
+	        sys.exit(1)
+
+	    if cmd == "hunks":
+	        if mode == "fixed-hunk":
+	            write_pkt("hunk 5 2 5 2")
+	        elif mode == "bad-hunk":
+	            write_pkt("hunk 999 1 999 1")
+	        elif mode == "zero-hunk":
+	            pass
+	        else:
+	            ol = len(old.split(b"\n"))
+	            nl = len(new.split(b"\n"))
+	            write_pkt(f"hunk 1 {ol} 1 {nl}")
+	        write_flush()
+	        write_pkt("status=success")
+	        write_flush()
+	    else:
+	        write_flush()
+	        write_pkt("status=error")
+	        write_flush()
+	PYEOF
+	write_script diff-process-backend <<-SHEOF
+	exec "$PYTHON_PATH" "$TRASH_DIRECTORY/diff-process-backend.py" "\$@"
+	SHEOF
+}
+
+BACKEND="./diff-process-backend"
+
+test_expect_success PYTHON 'setup' '
+	setup_backend &&
+	echo "*.c diff=cdiff" >.gitattributes &&
+	git add .gitattributes &&
+	git commit -m "initial"
+'
+
+test_expect_success PYTHON 'diff process hunk boundaries affect output' '
+	cat >boundary.c <<-\EOF &&
+	line1
+	line2
+	line3
+	line4
+	OLD5
+	OLD6
+	line7
+	line8
+	OLD9
+	OLD10
+	EOF
+	git add boundary.c &&
+	git commit -m "add boundary.c" &&
+
+	cat >boundary.c <<-\EOF &&
+	line1
+	line2
+	line3
+	line4
+	NEW5
+	NEW6
+	line7
+	line8
+	NEW9
+	NEW10
+	EOF
+
+	# The file has changes at lines 5-6 and 9-10, but fixed-hunk
+	# only reports lines 5-6 as changed.  Lines 9-10 should not
+	# appear as changed in the output.
+	git -c diff.cdiff.process="$BACKEND --mode=fixed-hunk" \
+		diff boundary.c >actual &&
+	grep "^-OLD5" actual &&
+	grep "^-OLD6" actual &&
+	grep "^+NEW5" actual &&
+	grep "^+NEW6" actual &&
+	! grep "^-OLD9" actual &&
+	! grep "^-OLD10" actual &&
+	! grep "^+NEW9" actual &&
+	! grep "^+NEW10" actual
+'
+
+test_expect_success PYTHON 'diff process fallback on tool error status' '
+	rm -f backend.log &&
+	git -c diff.cdiff.process="$BACKEND --mode=error --log=backend.log" \
+		diff boundary.c >actual &&
+	# Fallback produces the full builtin diff (both change regions).
+	grep "^-OLD5" actual &&
+	grep "^+NEW5" actual &&
+	grep "^-OLD9" actual &&
+	grep "^+NEW9" actual &&
+	# Tool was contacted (it replied with error, not crash).
+	grep "command=hunks pathname=boundary.c" backend.log
+'
+
+test_expect_success PYTHON 'diff process fallback on bad hunks' '
+	git -c diff.cdiff.process="$BACKEND --mode=bad-hunk" \
+		diff boundary.c >actual &&
+	grep "^-OLD5" actual &&
+	grep "^+NEW5" actual &&
+	grep "^-OLD9" actual &&
+	grep "^+NEW9" actual
+'
+
+test_expect_success PYTHON 'diff process fallback on tool crash' '
+	git -c diff.cdiff.process="$BACKEND --mode=crash" \
+		diff boundary.c >actual &&
+	grep "^-OLD5" actual &&
+	grep "^+NEW5" actual &&
+	grep "^-OLD9" actual &&
+	grep "^+NEW9" actual
+'
+
+test_expect_success PYTHON 'diff process abort disables for session' '
+	cat >abort1.c <<-\EOF &&
+	int first(void) { return 1; }
+	EOF
+	cat >abort2.c <<-\EOF &&
+	int second(void) { return 2; }
+	EOF
+	git add abort1.c abort2.c &&
+	git commit -m "add abort files" &&
+
+	cat >abort1.c <<-\EOF &&
+	int first(void) { return 10; }
+	EOF
+	cat >abort2.c <<-\EOF &&
+	int second(void) { return 20; }
+	EOF
+
+	rm -f backend.log &&
+	git -c diff.cdiff.process="$BACKEND --mode=abort --log=backend.log" \
+		diff -- abort1.c abort2.c >actual &&
+	# Both files should still produce diff output via fallback.
+	grep "return 10" actual &&
+	grep "return 20" actual &&
+	# The tool aborts on the first file and git clears its
+	# capability.  The second file never contacts the tool,
+	# so the log should have exactly one entry, not two.
+	grep "command=hunks" backend.log >matches &&
+	test_line_count = 1 matches
+'
+
+test_expect_success PYTHON 'diff process handles multiple files' '
+	cat >multi1.c <<-\EOF &&
+	int one(void) { return 1; }
+	EOF
+	cat >multi2.c <<-\EOF &&
+	int two(void) { return 2; }
+	EOF
+	git add multi1.c multi2.c &&
+	git commit -m "add multi files" &&
+
+	cat >multi1.c <<-\EOF &&
+	int one(void) { return 10; }
+	EOF
+	cat >multi2.c <<-\EOF &&
+	int two(void) { return 20; }
+	EOF
+
+	rm -f backend.log &&
+	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
+		diff -- multi1.c multi2.c >actual &&
+	grep "return 10" actual &&
+	grep "return 20" actual &&
+	grep "pathname=multi1.c" backend.log &&
+	grep "pathname=multi2.c" backend.log
+'
+
+test_expect_success PYTHON 'diff process with --word-diff' '
+	cat >worddiff.c <<-\EOF &&
+	int value(void) { return 1; }
+	EOF
+	git add worddiff.c &&
+	git commit -m "add worddiff.c" &&
+
+	cat >worddiff.c <<-\EOF &&
+	int value(void) { return 999; }
+	EOF
+
+	git -c diff.cdiff.process="$BACKEND" \
+		diff --word-diff worddiff.c >actual &&
+	grep "\[-1;-\]" actual &&
+	grep "{+999;+}" actual
+'
+
+test_expect_success PYTHON 'diff process bypassed by --diff-algorithm' '
+	rm -f backend.log &&
+	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
+		diff --diff-algorithm=patience worddiff.c >actual &&
+	grep "return 999" actual &&
+	test_path_is_missing backend.log
+'
+
+test_expect_success PYTHON 'diff process works with git log -p' '
+	cat >logtest.c <<-\EOF &&
+	int logfunc(void) { return 1; }
+	EOF
+	git add logtest.c &&
+	git commit -m "add logtest.c" &&
+
+	cat >logtest.c <<-\EOF &&
+	int logfunc(void) { return 2; }
+	EOF
+	git add logtest.c &&
+	git commit -m "change logtest.c" &&
+
+	rm -f backend.log &&
+	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
+		log -1 -p -- logtest.c >actual &&
+	grep "return 2" actual &&
+	grep "command=hunks pathname=logtest.c" backend.log
+'
+
+test_expect_success PYTHON 'diff process zero hunks suppresses diff output' '
+	cat >zerohunk.c <<-\EOF &&
+	int zero(void) { return 0; }
+	EOF
+	git add zerohunk.c &&
+	git commit -m "add zerohunk.c" &&
+
+	cat >zerohunk.c <<-\EOF &&
+	int zero(void) { return 999; }
+	EOF
+
+	git -c diff.cdiff.process="$BACKEND --mode=zero-hunk" \
+		diff zerohunk.c >actual &&
+	test_must_be_empty actual
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH 4/5] blame: consult diff process for zero-hunk detection
From: Michael Montalbo via GitGitGadget @ 2026-05-22  2:11 UTC (permalink / raw)
  To: git; +Cc: Michael Montalbo, Michael Montalbo
In-Reply-To: <pull.2120.git.1779415884.gitgitgadget@gmail.com>

From: Michael Montalbo <mmontalbo@gmail.com>

When a diff process is configured via diff.<driver>.process,
consult it during blame's per-commit diffing.  If the process
returns zero hunks for a commit's changes to a file, treat the
commit as having no changes, causing blame to attribute lines
to earlier commits.

The subprocess is long-running (one startup cost amortized
across the blame traversal), but each commit in the file's
history incurs a round-trip to the tool.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
---
 Documentation/gitattributes.adoc |  3 +++
 blame.c                          | 43 +++++++++++++++++++++++++++++---
 t/t4080-diff-process.sh          | 32 ++++++++++++++++++++++++
 3 files changed, 74 insertions(+), 4 deletions(-)

diff --git a/Documentation/gitattributes.adoc b/Documentation/gitattributes.adoc
index cc724f8c63..7d66fa3aa1 100644
--- a/Documentation/gitattributes.adoc
+++ b/Documentation/gitattributes.adoc
@@ -857,6 +857,9 @@ The tool responds with lines of the form
 
 If the tool returns zero hunks with `status=success`, Git treats
 the file as having no changes and produces no diff output.
+`git blame` also consults the diff process and skips commits
+where it reports zero hunks, attributing lines to earlier commits
+instead.
 
 Tools should ignore unknown keys in the per-file request to
 remain forward-compatible.
diff --git a/blame.c b/blame.c
index a3c49d132e..8a5f14db7a 100644
--- a/blame.c
+++ b/blame.c
@@ -19,6 +19,8 @@
 #include "tag.h"
 #include "trace2.h"
 #include "blame.h"
+#include "diff-process.h"
+#include "userdiff.h"
 #include "alloc.h"
 #include "commit-slab.h"
 #include "bloom.h"
@@ -315,16 +317,47 @@ static struct commit *fake_working_tree_commit(struct repository *r,
 
 
 static int diff_hunks(mmfile_t *file_a, mmfile_t *file_b,
-		      xdl_emit_hunk_consume_func_t hunk_func, void *cb_data, int xdl_opts)
+		      xdl_emit_hunk_consume_func_t hunk_func, void *cb_data,
+		      int xdl_opts, struct index_state *istate,
+		      const char *path)
 {
 	xpparam_t xpp = {0};
 	xdemitconf_t xecfg = {0};
 	xdemitcb_t ecb = {NULL};
+	struct xdl_hunk *ext_hunks = NULL;
+	int ret;
 
 	xpp.flags = xdl_opts;
 	xecfg.hunk_func = hunk_func;
 	ecb.priv = cb_data;
-	return xdi_diff(file_a, file_b, &xpp, &xecfg, &ecb);
+
+	if (path && istate) {
+		struct userdiff_driver *drv;
+		drv = userdiff_find_by_path(istate, path);
+		if (drv && drv->process) {
+			size_t nr = 0;
+			if (!diff_process_get_hunks(drv, path,
+						    file_a->ptr, file_a->size,
+						    file_b->ptr, file_b->size,
+						    &ext_hunks, &nr)) {
+				if (!nr) {
+					/*
+					 * Zero hunks: the diff process
+					 * considers these files equivalent.
+					 * Skip so blame looks past this
+					 * commit.
+					 */
+					return 0;
+				}
+				xpp.external_hunks = ext_hunks;
+				xpp.external_hunks_nr = nr;
+			}
+		}
+	}
+
+	ret = xdi_diff(file_a, file_b, &xpp, &xecfg, &ecb);
+	free(ext_hunks);
+	return ret;
 }
 
 static const char *get_next_line(const char *start, const char *end)
@@ -1961,7 +1994,8 @@ static void pass_blame_to_parent(struct blame_scoreboard *sb,
 			 &sb->num_read_blob, ignore_diffs);
 	sb->num_get_patch++;
 
-	if (diff_hunks(&file_p, &file_o, blame_chunk_cb, &d, sb->xdl_opts))
+	if (diff_hunks(&file_p, &file_o, blame_chunk_cb, &d, sb->xdl_opts,
+		       sb->revs->diffopt.repo->index, target->path))
 		die("unable to generate diff (%s -> %s)",
 		    oid_to_hex(&parent->commit->object.oid),
 		    oid_to_hex(&target->commit->object.oid));
@@ -2114,7 +2148,8 @@ static void find_copy_in_blob(struct blame_scoreboard *sb,
 	 * file_p partially may match that image.
 	 */
 	memset(split, 0, sizeof(struct blame_entry [3]));
-	if (diff_hunks(file_p, &file_o, handle_split_cb, &d, sb->xdl_opts))
+	if (diff_hunks(file_p, &file_o, handle_split_cb, &d, sb->xdl_opts,
+		       NULL, NULL))
 		die("unable to generate diff (%s)",
 		    oid_to_hex(&parent->commit->object.oid));
 	/* remainder, if any, all match the preimage */
diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
index 6f49f4e66b..5ed644b786 100755
--- a/t/t4080-diff-process.sh
+++ b/t/t4080-diff-process.sh
@@ -335,4 +335,36 @@ test_expect_success PYTHON 'diff process zero hunks suppresses diff output' '
 	test_must_be_empty actual
 '
 
+test_expect_success PYTHON 'blame skips commits with zero hunks from diff process' '
+	cat >blame.c <<-\EOF &&
+	int main(void)
+	{
+	    return 0;
+	}
+	EOF
+	git add blame.c &&
+	git commit -m "add blame.c" &&
+
+	cat >blame.c <<-\EOF &&
+	int main(void)
+	{
+	        return 0;
+	}
+	EOF
+	git add blame.c &&
+	git commit -m "reformat blame.c" &&
+	BLAME_COMMIT=$(git rev-parse --short HEAD) &&
+
+	# Without zero-hunk mode, blame attributes the change.
+	git blame blame.c >without &&
+	grep "$BLAME_COMMIT" without &&
+
+	# With zero-hunk mode, the process considers the files equivalent
+	# and blame skips the reformat commit.
+	git -c diff.cdiff.process="$BACKEND --mode=zero-hunk" \
+		blame blame.c >with &&
+	! grep "$BLAME_COMMIT" with
+'
+
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH 5/5] diff-process-normalize: add built-in whitespace normalizer
From: Michael Montalbo via GitGitGadget @ 2026-05-22  2:11 UTC (permalink / raw)
  To: git; +Cc: Michael Montalbo, Michael Montalbo
In-Reply-To: <pull.2120.git.1779415884.gitgitgadget@gmail.com>

From: Michael Montalbo <mmontalbo@gmail.com>

Add git diff-process-normalize, a built-in diff process that
detects whitespace-only changes.  It compares files line by line
using xdiff_compare_lines() with XDF_IGNORE_WHITESPACE (same
logic as "git diff -w").  If all lines match, it returns zero
hunks; otherwise it returns an error so git falls back to the
builtin diff algorithm.

    [diff "cdiff"]
        process = git diff-process-normalize

Update documentation to describe zero-hunk behavior for diff
and blame, and document the built-in normalize tool.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
---
 Documentation/config/diff.adoc   |   2 +
 Documentation/gitattributes.adoc |  15 ++++
 Makefile                         |   1 +
 builtin.h                        |   1 +
 builtin/diff-process-normalize.c | 143 +++++++++++++++++++++++++++++++
 git.c                            |   1 +
 t/t4080-diff-process.sh          |  60 +++++++++++++
 7 files changed, 223 insertions(+)
 create mode 100644 builtin/diff-process-normalize.c

diff --git a/Documentation/config/diff.adoc b/Documentation/config/diff.adoc
index 4ab5f60df6..475736c6ed 100644
--- a/Documentation/config/diff.adoc
+++ b/Documentation/config/diff.adoc
@@ -224,6 +224,8 @@ endif::git-diff[]
 	hunks that are fed into Git's diff and blame pipelines.
 	If the tool returns zero hunks, the file is treated as
 	unchanged for both diff output and blame attribution.
+	Git provides `git diff-process-normalize` as a built-in
+	tool that detects whitespace-only changes.
 	See linkgit:gitattributes[5] for details.
 
 `diff.indentHeuristic`::
diff --git a/Documentation/gitattributes.adoc b/Documentation/gitattributes.adoc
index 7d66fa3aa1..3f1d7affd8 100644
--- a/Documentation/gitattributes.adoc
+++ b/Documentation/gitattributes.adoc
@@ -861,6 +861,21 @@ the file as having no changes and produces no diff output.
 where it reports zero hunks, attributing lines to earlier commits
 instead.
 
+Git ships with a built-in diff process, `git diff-process-normalize`,
+that detects whitespace-only changes.  Files whose only differences
+are whitespace produce zero hunks; files with non-whitespace changes
+fall back to the builtin diff algorithm.  To use it:
+
+----------------------------------------------------------------
+[diff "cdiff"]
+  process = git diff-process-normalize
+----------------------------------------------------------------
+
+This is useful after running a code formatter: `git diff` shows
+no output for files that only had whitespace changes,
+`git blame` skips whitespace-only commits automatically without
+requiring a `.git-blame-ignore-revs` file.
+
 Tools should ignore unknown keys in the per-file request to
 remain forward-compatible.
 
diff --git a/Makefile b/Makefile
index 22900368dd..01acfaf7b8 100644
--- a/Makefile
+++ b/Makefile
@@ -1409,6 +1409,7 @@ BUILTIN_OBJS += builtin/diagnose.o
 BUILTIN_OBJS += builtin/diff-files.o
 BUILTIN_OBJS += builtin/diff-index.o
 BUILTIN_OBJS += builtin/diff-pairs.o
+BUILTIN_OBJS += builtin/diff-process-normalize.o
 BUILTIN_OBJS += builtin/diff-tree.o
 BUILTIN_OBJS += builtin/diff.o
 BUILTIN_OBJS += builtin/difftool.o
diff --git a/builtin.h b/builtin.h
index 235c51f30e..c713a0417f 100644
--- a/builtin.h
+++ b/builtin.h
@@ -178,6 +178,7 @@ int cmd_diff_files(int argc, const char **argv, const char *prefix, struct repos
 int cmd_diff_index(int argc, const char **argv, const char *prefix, struct repository *repo);
 int cmd_diff(int argc, const char **argv, const char *prefix, struct repository *repo);
 int cmd_diff_pairs(int argc, const char **argv, const char *prefix, struct repository *repo);
+int cmd_diff_process_normalize(int argc, const char **argv, const char *prefix, struct repository *repo);
 int cmd_diff_tree(int argc, const char **argv, const char *prefix, struct repository *repo);
 int cmd_difftool(int argc, const char **argv, const char *prefix, struct repository *repo);
 int cmd_env__helper(int argc, const char **argv, const char *prefix, struct repository *repo);
diff --git a/builtin/diff-process-normalize.c b/builtin/diff-process-normalize.c
new file mode 100644
index 0000000000..1580f6b7d9
--- /dev/null
+++ b/builtin/diff-process-normalize.c
@@ -0,0 +1,143 @@
+/*
+ * Built-in diff process that returns zero hunks for files whose
+ * only differences are whitespace, and status=error otherwise.
+ * See diff-process.c for the protocol and gitattributes(5) for usage.
+ *
+ * Uses xdiff_compare_lines() with XDF_IGNORE_WHITESPACE to compare
+ * lines, giving the same whitespace handling as "git diff -w".
+ */
+
+#include "builtin.h"
+#include "pkt-line.h"
+#include "strbuf.h"
+#include "xdiff-interface.h"
+
+/*
+ * Read a single pkt-line.  Returns 1 for data, 0 for flush, -1 for EOF.
+ */
+static int read_pkt(int fd, struct strbuf *line)
+{
+	int len;
+	char *data;
+
+	if (packet_read_line_gently(fd, &len, &data) < 0)
+		return -1;
+	if (!data || !len)
+		return 0; /* flush */
+	strbuf_reset(line);
+	strbuf_add(line, data, len);
+	strbuf_rtrim(line);
+	return 1;
+}
+
+/*
+ * Read packetized content until a flush packet.
+ */
+static int read_content(int fd, struct strbuf *out)
+{
+	strbuf_reset(out);
+	if (read_packetized_to_strbuf(fd, out, PACKET_READ_GENTLE_ON_EOF) < 0)
+		return -1;
+	return 0;
+}
+
+/*
+ * Compare two buffers line by line using xdiff_compare_lines() with
+ * XDF_IGNORE_WHITESPACE (same logic as "git diff -w").
+ * Returns 1 if all lines match, 0 otherwise.
+ */
+static int whitespace_equivalent(const char *a, long size_a,
+				 const char *b, long size_b)
+{
+	const char *ea = a + size_a;
+	const char *eb = b + size_b;
+
+	while (a < ea && b < eb) {
+		const char *eol_a = memchr(a, '\n', ea - a);
+		const char *eol_b = memchr(b, '\n', eb - b);
+		long len_a = (eol_a ? eol_a : ea) - a;
+		long len_b = (eol_b ? eol_b : eb) - b;
+
+		if (!xdiff_compare_lines(a, len_a, b, len_b,
+					 XDF_IGNORE_WHITESPACE))
+			return 0;
+
+		a += len_a + (eol_a ? 1 : 0);
+		b += len_b + (eol_b ? 1 : 0);
+	}
+
+	/* Both sides must be exhausted */
+	return a >= ea && b >= eb;
+}
+
+int cmd_diff_process_normalize(int argc UNUSED, const char **argv UNUSED,
+			       const char *prefix UNUSED,
+			       struct repository *repo UNUSED)
+{
+	struct strbuf line = STRBUF_INIT;
+	struct strbuf old_content = STRBUF_INIT;
+	struct strbuf new_content = STRBUF_INIT;
+	int ret;
+
+	/* Handshake: read client greeting */
+	ret = read_pkt(0, &line);
+	if (ret <= 0 || strcmp(line.buf, "git-diff-client"))
+		return 1;
+	ret = read_pkt(0, &line);
+	if (ret <= 0 || strcmp(line.buf, "version=1"))
+		return 1;
+	read_pkt(0, &line); /* flush */
+
+	/* Send server greeting */
+	packet_write_fmt(1, "git-diff-server\n");
+	packet_write_fmt(1, "version=1\n");
+	packet_flush(1);
+
+	/* Read client capabilities until flush */
+	while ((ret = read_pkt(0, &line)) > 0)
+		; /* consume */
+
+	/* Send our capabilities */
+	packet_write_fmt(1, "capability=hunks\n");
+	packet_flush(1);
+
+	/* Main loop: process file pairs */
+	for (;;) {
+		int have_command = 0;
+
+		/* Read request headers until flush */
+		while ((ret = read_pkt(0, &line)) > 0) {
+			if (starts_with(line.buf, "command="))
+				have_command = 1;
+		}
+		if (ret < 0)
+			break; /* EOF: client closed connection */
+		if (!have_command)
+			break;
+
+		/* Read old file content */
+		if (read_content(0, &old_content) < 0)
+			break;
+		/* Read new file content */
+		if (read_content(0, &new_content) < 0)
+			break;
+
+		if (whitespace_equivalent(old_content.buf, old_content.len,
+					  new_content.buf, new_content.len)) {
+			/* Whitespace-only differences */
+			packet_flush(1); /* zero hunks */
+			packet_write_fmt(1, "status=success\n");
+			packet_flush(1);
+		} else {
+			/* Non-whitespace differences: fall back */
+			packet_flush(1);
+			packet_write_fmt(1, "status=error\n");
+			packet_flush(1);
+		}
+	}
+
+	strbuf_release(&line);
+	strbuf_release(&old_content);
+	strbuf_release(&new_content);
+	return 0;
+}
diff --git a/git.c b/git.c
index 5a40eab8a2..6239240b02 100644
--- a/git.c
+++ b/git.c
@@ -568,6 +568,7 @@ static struct cmd_struct commands[] = {
 	{ "diff-files", cmd_diff_files, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT },
 	{ "diff-index", cmd_diff_index, RUN_SETUP | NO_PARSEOPT },
 	{ "diff-pairs", cmd_diff_pairs, RUN_SETUP | NO_PARSEOPT },
+	{ "diff-process-normalize", cmd_diff_process_normalize, NO_PARSEOPT },
 	{ "diff-tree", cmd_diff_tree, RUN_SETUP | NO_PARSEOPT },
 	{ "difftool", cmd_difftool, RUN_SETUP_GENTLY },
 	{ "fast-export", cmd_fast_export, RUN_SETUP },
diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
index 5ed644b786..a6fa1df456 100755
--- a/t/t4080-diff-process.sh
+++ b/t/t4080-diff-process.sh
@@ -366,5 +366,65 @@ test_expect_success PYTHON 'blame skips commits with zero hunks from diff proces
 	! grep "$BLAME_COMMIT" with
 '
 
+NORMALIZE="git diff-process-normalize"
+
+test_expect_success 'diff-process-normalize setup' '
+	echo "*.c diff=cdiff" >.gitattributes &&
+	git add .gitattributes &&
+	test_commit normalize-base
+'
+
+test_expect_success 'diff-process-normalize suppresses whitespace-only changes' '
+	cat >ws.c <<-\EOF &&
+	int main(void)
+	{
+	    return 0;
+	}
+	EOF
+	git add ws.c &&
+	git commit -m "add ws.c" &&
+
+	cat >ws.c <<-\EOF &&
+	int main(void)
+	{
+	        return 0;
+	}
+	EOF
+
+	git -c diff.cdiff.process="$NORMALIZE" \
+		diff ws.c >actual &&
+	test_must_be_empty actual
+'
+
+test_expect_success 'diff-process-normalize falls back on non-whitespace changes' '
+	cat >ws.c <<-\EOF &&
+	int main(void)
+	{
+	    return 0;
+	}
+
+	int added_function(void)
+	{
+	    return 99;
+	}
+	EOF
+
+	git -c diff.cdiff.process="$NORMALIZE" \
+		diff ws.c >actual &&
+	grep "added_function" actual
+'
+
+test_expect_success 'diff-process-normalize falls back on mixed whitespace and real changes' '
+	cat >ws.c <<-\EOF &&
+	int main(void)
+	{
+	        return 42;
+	}
+	EOF
+
+	git -c diff.cdiff.process="$NORMALIZE" \
+		diff ws.c >actual &&
+	grep "return 42" actual
+'
 
 test_done
-- 
gitgitgadget

^ permalink raw reply related

* Re: [PATCH v10 2/4] branch: add --prune-merged <branch>
From: Junio C Hamano @ 2026-05-22  2:51 UTC (permalink / raw)
  To: Harald Nordgren via GitGitGadget
  Cc: git, Kristoffer Haugsbakk, Johannes Sixt, Phillip Wood,
	Harald Nordgren
In-Reply-To: <718e28c7e0120a826385189213cccec1f0fce1af.1779403204.git.gitgitgadget@gmail.com>

"Harald Nordgren via GitGitGadget" <gitgitgadget@gmail.com> writes:



> diff --git a/builtin/branch.c b/builtin/branch.c
> index 1e24c95a69..29d38e9060 100644
> --- a/builtin/branch.c
> +++ b/builtin/branch.c

Due to the way the patch is split between 1/4 and 2/4, it is
impossible to comment on the change to delete_branches etc. that are
needed for this step.  I'll use "git diff master... builtin/" instead.

> diff --git a/builtin/branch.c b/builtin/branch.c
> index 1572a4f9ef..b89fd56112 100644
> --- a/builtin/branch.c
> +++ b/builtin/branch.c
> @@ -1,43 +1,47 @@
> ...
> @@ -132,78 +136,87 @@ static const char *branch_get_color(enum color_branch ix)
>  static int branch_merged(int kind, const char *name,
>  			 struct commit *rev, struct commit *head_rev)
>  {

This one is called from check_branch_commit().  In "--prune-merged"
code paths, as we will see below, kind==FILTER_REFS_BRANCHES is
passed.

But note that there is no reason to expect "--prune-merged" will be
the only one to pass FILTER_REFS_BRANCHES to this function.  Most
notably, "git branch -d" without "-r" would set FILTER_REFS_BRANCHES
to filter.kind and passes the value here.

>  	/*
>  	 * This checks whether the merge bases of branch and HEAD (or
>  	 * the other branch this branch builds upon) contains the
>  	 * branch, which means that the branch has already been merged
>  	 * safely to HEAD (or the other branch).
>  	 */
>  	struct commit *reference_rev = NULL;
>  	const char *reference_name = NULL;
>  	void *reference_name_to_free = NULL;
>  	int merged;
>  
>  	if (kind == FILTER_REFS_BRANCHES) {
>  		struct branch *branch = branch_get(name);
>  		const char *upstream = branch_get_upstream(branch, NULL);
>  		struct object_id oid;
>  
>  		if (upstream &&
>  		    (reference_name = reference_name_to_free =
>  		     refs_resolve_refdup(get_main_ref_store(the_repository), upstream, RESOLVE_REF_READING,
>  					 &oid, NULL)) != NULL)
>  			reference_rev = lookup_commit_reference(the_repository,
>  								&oid);
>  	}

We find its upstream in reference_rev; if this is non-NULL, the
branch MUST be merged to that revision for it to be safely removed.

>  	if (!reference_rev)
>  		reference_rev = head_rev;

But when ehad_rev is given, and if there is no upstream, we check if
the branch is merged to whatever happens to be checked out instead.
For the purpose of "--prune-merged", therefore, we MUST pass NULL in
head_rev when we call this function.  We'll see what is done in the
caller later.

"Harald Nordgren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/builtin/branch.c b/builtin/branch.c
> index 1e24c95a69..29d38e9060 100644
> --- a/builtin/branch.c
> +++ b/builtin/branch.c
> @@ -172,8 +174,8 @@ static int branch_merged(int kind, const char *name,
>  	 * any of the following code, but during the transition period,
>  	 * a gentle reminder is in order.
>  	 */
> -	if (head_rev != reference_rev) {
> -		int expect = head_rev ? repo_in_merge_bases(the_repository, rev, head_rev) : 0;
> +	if (head_rev && head_rev != reference_rev) {
> +		int expect = repo_in_merge_bases(the_repository, rev, head_rev);

Any caller who passed head_rev==NULL still used to come into this
block, set expect to 0, and have it compared with merged.  If merged
is 0 (which is what happens when reference_rev is NULL and head_rev
is NULL), the control left this block silently due to /* okay */
below (which is in the post-context that is not shown).

The updated code refuses control to come into this block when
head_rev is NULL.  I am not sure why the change in this hunk is
needed.

> @@ -748,6 +750,25 @@ static int collect_forked_branch(const struct reference *ref, void *cb_data)
>  	return 0;
>  }
>  
> +static int collect_default_branch_name(struct remote *remote, void *cb_data)
> +{
> +	struct string_list *protected = cb_data;
> +	struct ref_store *refs = get_main_ref_store(the_repository);
> +	struct strbuf head = STRBUF_INIT;
> +	const char *target;
> +
> +	strbuf_addf(&head, "refs/remotes/%s/HEAD", remote->name);
> +	target = refs_resolve_ref_unsafe(refs, head.buf,
> +					 RESOLVE_REF_NO_RECURSE, NULL, NULL);
> +	if (target) {
> +		const char *leaf = strrchr(target, '/');
> +		if (leaf)
> +			string_list_insert(protected, leaf + 1);
> +	}
> +	strbuf_release(&head);
> +	return 0;
> +}

It is strange to assume that whatever upstream repository happens to
use as its primary branch name has anything to do with how the local
repository names its primary branch.  Shouldn't this instead use
init.defaultBranch configuration or something?

> @@ -781,6 +802,63 @@ static int list_forked_branches(int argc, const char **argv)
>  	return 0;
>  }
>  
> +static int prune_merged_branches(int argc, const char **argv, int quiet)
> +{
> +	struct ref_store *refs = get_main_ref_store(the_repository);
> +	struct string_list candidates = STRING_LIST_INIT_DUP;
> +	struct string_list protected_default_names = STRING_LIST_INIT_DUP;
> +	struct strvec deletable = STRVEC_INIT;
> +	struct strbuf buf = STRBUF_INIT;
> +	struct string_list_item *item;
> +	int n_not_merged = 0;
> +	int ret = 0;
> +
> +	if (!argc)
> +		die(_("--prune-merged requires at least one <branch>"));
> +
> +	collect_forked_set(argc, argv, &candidates);

Due to poor separation of changes, all the necessary information to
assess how sane the above code is is hidden in [1/4] and not here.

> +	for_each_remote(collect_default_branch_name, &protected_default_names);

This looks inefficient, and worse, incorrect.

If we have multiple branches in argv[], they may have come from
different upstream, and because you pay attention to what the
refs/remotes/<upstream>/HEAD symrefs point at, you have multiple
such "protected" default names.  You'd compare each and every
candidates against these names using linear search in the
string_list to see if they are protected (inefficient), and you
reject removal of argv[1] even when it happens to be the same name
as the default in the repository from which the upstream of argv[3]
came from, i.e., has no relationship with argv[1] (incorrect).

> +	for_each_string_list_item(item, &candidates) {
> +		const char *short_name = item->string;
> +		const char *upstream = item->util;
> +
> +		strbuf_reset(&buf);
> +		strbuf_addf(&buf, "refs/heads/%s", short_name);
> +		if (branch_checked_out(buf.buf))
> +			continue;
> +
> +		if (string_list_has_string(&protected_default_names,
> +					   short_name))
> +			continue;
> +
> +		if (!refs_ref_exists(refs, upstream))
> +			continue;
> +
> +		strvec_push(&deletable, short_name);
> +	}
> +	strbuf_release(&buf);
> +
> +	if (deletable.nr)
> +		ret = delete_branches(deletable.nr, deletable.v,
> +				      0, FILTER_REFS_BRANCHES, quiet,
> +				      1, &n_not_merged);

Add comments on parameters, perhaps, to make it readable?

		ret = delete_branches(deletable.nr, deletable.v,
				      0, /* no force */
				      FILTER_REFS_BRANCHES,
				      quiet,
				      1, /* warn only */
				      &n_not_merged);

or something?

Unfortunately the body of delete_branches() updated by this series
is not visible in this patch, making a sensible review impossible,
but if I recall correctly from what I saw in [1/4], with no-force
set, it does not leave head_rev NULL, and instead reads the HEAD
into it, and that is eventually passed to check_branch_commit()
call.

Now, check_branch_commit() passes head_rev to branch_merged(), which
falls back to "HEAD" when head_rev is given.  That sounds incorrect
for at least two reasons.  We are only dealing with branches that do
have upstream, so fallback should not trigger and we shouldn't be
allowing head_rev to be passed.  It _might_ be debatable that it is
prudent to still expect branch_merged() not to find the upstream to
detect errors in the logic, but falling back to head_rev in such a
case does not make any sense.

> +	if (n_not_merged && !quiet)
> +		fprintf(stderr,
> +			Q_("Skipped %d branch that is not fully merged; "
> +			   "delete it with 'git branch -D' if you are sure.\n",
> +			   "Skipped %d branches that are not fully merged; "
> +			   "delete them with 'git branch -D' if you are sure.\n",
> +			   n_not_merged),
> +			n_not_merged);

I do not think we unconditionally want to see this, and "--quiet"
shouldn't be the onlyl way to squelch this message.

When !quiet, the warn_only call to check_branch_commit() would
already have reported which branches are not fully merged, and
after seeing this message a few times, even the most novice user
would know how to use "git branch -D" to remove unneeded branches.

Use of advice_if_enabled() may make it more palatable.

> diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh
> index 45455cb8ce..c8589cd3a6 100755
> --- a/t/t3200-branch.sh
> +++ b/t/t3200-branch.sh
> @@ -1798,4 +1798,143 @@ test_expect_success '--forked requires at least one <branch>' '
>  	test_grep "at least one <branch>" err
>  '
>  
> +test_expect_success '--prune-merged: setup' '
> +	test_create_repo pm-upstream &&
> +	test_commit -C pm-upstream base &&
> +	git -C pm-upstream checkout -b next &&
> +	test_commit -C pm-upstream one-commit &&
> +	test_commit -C pm-upstream two-commit &&
> +	git -C pm-upstream branch one HEAD~ &&
> +	git -C pm-upstream branch two HEAD &&
> +	git -C pm-upstream branch wip main &&
> +	git -C pm-upstream checkout main
> +'
> +
> +test_expect_success '--prune-merged deletes branches integrated into upstream' '
> +	test_when_finished "rm -rf pm-merged" &&
> +	git clone pm-upstream pm-merged &&
> +	git -C pm-merged branch one one-commit &&
> +	git -C pm-merged branch --set-upstream-to=origin/next one &&
> +	git -C pm-merged branch two two-commit &&
> +	git -C pm-merged branch --set-upstream-to=origin/next two &&
> +
> +	git -C pm-merged branch --prune-merged "origin/*" &&
> +
> +	test_must_fail git -C pm-merged rev-parse --verify refs/heads/one &&
> +	test_must_fail git -C pm-merged rev-parse --verify refs/heads/two
> +'
> +
> +test_expect_success '--prune-merged with a literal upstream argument' '
> +	test_when_finished "rm -rf pm-literal" &&
> +	git clone pm-upstream pm-literal &&
> +	git -C pm-literal branch one one-commit &&
> +	git -C pm-literal branch --set-upstream-to=origin/next one &&
> +	git -C pm-literal branch keepme one-commit &&
> +	git -C pm-literal branch --set-upstream-to=origin/main keepme &&
> +
> +	git -C pm-literal branch --prune-merged origin/next &&
> +
> +	test_must_fail git -C pm-literal rev-parse --verify refs/heads/one &&
> +	git -C pm-literal rev-parse --verify refs/heads/keepme
> +'
> +
> +test_expect_success '--prune-merged unions multiple <branch> arguments' '
> +	test_when_finished "rm -rf pm-union" &&
> +	git clone pm-upstream pm-union &&
> +	git -C pm-union branch one one-commit &&
> +	git -C pm-union branch --set-upstream-to=origin/next one &&
> +	git -C pm-union branch two base &&
> +	git -C pm-union branch --set-upstream-to=origin/main two &&
> +
> +	git -C pm-union branch --prune-merged origin/next origin/main &&
> +
> +	test_must_fail git -C pm-union rev-parse --verify refs/heads/one &&
> +	test_must_fail git -C pm-union rev-parse --verify refs/heads/two
> +'
> +
> +test_expect_success '--prune-merged with a local-branch argument' '
> +	test_create_repo pm-local &&
> +	test_when_finished "rm -rf pm-local" &&
> +	test_commit -C pm-local base &&
> +	git -C pm-local branch topic base &&
> +	git -C pm-local config branch.topic.remote . &&
> +	git -C pm-local config branch.topic.merge refs/heads/main &&
> +	git -C pm-local checkout --detach &&
> +
> +	git -C pm-local branch --prune-merged main &&
> +
> +	test_must_fail git -C pm-local rev-parse --verify refs/heads/topic &&
> +	git -C pm-local rev-parse --verify refs/heads/main
> +'
> +
> +test_expect_success '--prune-merged spares branches with un-integrated commits' '
> +	test_when_finished "rm -rf pm-unmerged" &&
> +	git clone pm-upstream pm-unmerged &&
> +	git -C pm-unmerged checkout -b wip origin/wip &&
> +	git -C pm-unmerged branch --set-upstream-to=origin/next wip &&
> +	test_commit -C pm-unmerged local-only &&
> +	git -C pm-unmerged checkout - &&
> +
> +	git -C pm-unmerged branch --prune-merged "origin/*" 2>err &&
> +	test_grep "not fully merged" err &&
> +	test_grep "Skipped 1 branch" err &&
> +	test_grep "git branch -D" err &&
> +	test_grep ! "If you are sure you want to delete it" err &&
> +	git -C pm-unmerged rev-parse --verify refs/heads/wip
> +'
> +
> +test_expect_success '--prune-merged skips branches whose upstream is gone' '
> +	test_when_finished "rm -rf pm-upstream-gone" &&
> +	git clone pm-upstream pm-upstream-gone &&
> +	git -C pm-upstream-gone branch one one-commit &&
> +	git -C pm-upstream-gone branch --set-upstream-to=origin/next one &&
> +
> +	git -C pm-upstream-gone update-ref -d refs/remotes/origin/next &&
> +	git -C pm-upstream-gone branch --prune-merged "origin/*" &&
> +
> +	git -C pm-upstream-gone rev-parse --verify refs/heads/one
> +'
> +
> +test_expect_success '--prune-merged never deletes the checked-out branch' '
> +	test_when_finished "rm -rf pm-head" &&
> +	git clone pm-upstream pm-head &&
> +	git -C pm-head checkout -b one one-commit &&
> +	git -C pm-head branch --set-upstream-to=origin/next one &&
> +
> +	git -C pm-head branch --prune-merged "origin/*" &&
> +
> +	git -C pm-head rev-parse --verify refs/heads/one
> +'
> +
> +test_expect_success '--prune-merged spares the local default branch' '
> +	test_when_finished "rm -rf pm-default" &&
> +	git clone pm-upstream pm-default &&
> +	git -C pm-default checkout --detach &&
> +	git -C pm-default branch --prune-merged "origin/*" &&
> +	git -C pm-default rev-parse --verify refs/heads/main
> +'
> +
> +test_expect_success '--prune-merged protects the default branch by name only' '
> +	test_when_finished "rm -rf pm-default-alias" &&
> +	git clone pm-upstream pm-default-alias &&
> +	git -C pm-default-alias branch --track trunk origin/main &&
> +	git -C pm-default-alias checkout --detach &&
> +	git -C pm-default-alias branch --prune-merged "origin/*" &&
> +	git -C pm-default-alias rev-parse --verify refs/heads/main &&
> +	test_must_fail git -C pm-default-alias rev-parse --verify refs/heads/trunk
> +'
> +
> +test_expect_success '--prune-merged with literal arg also protects default-name' '
> +	test_when_finished "rm -rf pm-literal-default" &&
> +	git clone pm-upstream pm-literal-default &&
> +	git -C pm-literal-default checkout --detach &&
> +	git -C pm-literal-default branch --prune-merged origin/main &&
> +	git -C pm-literal-default rev-parse --verify refs/heads/main
> +'
> +
> +test_expect_success '--prune-merged requires at least one <branch>' '
> +	test_must_fail git -C pm-upstream branch --prune-merged 2>err &&
> +	test_grep "at least one <branch>" err
> +'
> +
>  test_done

^ permalink raw reply

* Re: [PATCH v10 2/4] branch: add --prune-merged <branch>
From: Junio C Hamano @ 2026-05-22  2:52 UTC (permalink / raw)
  To: Harald Nordgren via GitGitGadget
  Cc: git, Kristoffer Haugsbakk, Johannes Sixt, Phillip Wood,
	Harald Nordgren
In-Reply-To: <718e28c7e0120a826385189213cccec1f0fce1af.1779403204.git.gitgitgadget@gmail.com>

"Harald Nordgren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/builtin/branch.c b/builtin/branch.c
> index 1e24c95a69..29d38e9060 100644
> --- a/builtin/branch.c
> +++ b/builtin/branch.c
> @@ -172,8 +174,8 @@ static int branch_merged(int kind, const char *name,
>  	 * any of the following code, but during the transition period,
>  	 * a gentle reminder is in order.
>  	 */
> -	if (head_rev != reference_rev) {
> -		int expect = head_rev ? repo_in_merge_bases(the_repository, rev, head_rev) : 0;
> +	if (head_rev && head_rev != reference_rev) {
> +		int expect = repo_in_merge_bases(the_repository, rev, head_rev);

Any caller who passed head_rev==NULL still used to come into this
block, set expect to 0, and have it compared with merged.  If merged
is 0 (which is what happens when reference_rev is NULL and head_rev
is NULL), the control left this block silently due to /* okay */
below (which is in the post-context that is not shown).

The updated code refuses control to come into this block when
head_rev is NULL.  I am not sure why the change in this hunk is
needed.

> @@ -748,6 +750,25 @@ static int collect_forked_branch(const struct reference *ref, void *cb_data)
>  	return 0;
>  }
>  
> +static int collect_default_branch_name(struct remote *remote, void *cb_data)
> +{
> +	struct string_list *protected = cb_data;
> +	struct ref_store *refs = get_main_ref_store(the_repository);
> +	struct strbuf head = STRBUF_INIT;
> +	const char *target;
> +
> +	strbuf_addf(&head, "refs/remotes/%s/HEAD", remote->name);
> +	target = refs_resolve_ref_unsafe(refs, head.buf,
> +					 RESOLVE_REF_NO_RECURSE, NULL, NULL);
> +	if (target) {
> +		const char *leaf = strrchr(target, '/');
> +		if (leaf)
> +			string_list_insert(protected, leaf + 1);
> +	}
> +	strbuf_release(&head);
> +	return 0;
> +}

It is strange to assume that whatever upstream repository happens to
use as its primary branch name has anything to do with how the local
repository names its primary branch.  Shouldn't this instead use
init.defaultBranch configuration or something?

> @@ -781,6 +802,63 @@ static int list_forked_branches(int argc, const char **argv)
>  	return 0;
>  }
>  
> +static int prune_merged_branches(int argc, const char **argv, int quiet)
> +{
> +	struct ref_store *refs = get_main_ref_store(the_repository);
> +	struct string_list candidates = STRING_LIST_INIT_DUP;
> +	struct string_list protected_default_names = STRING_LIST_INIT_DUP;
> +	struct strvec deletable = STRVEC_INIT;
> +	struct strbuf buf = STRBUF_INIT;
> +	struct string_list_item *item;
> +	int n_not_merged = 0;
> +	int ret = 0;
> +
> +	if (!argc)
> +		die(_("--prune-merged requires at least one <branch>"));
> +
> +	collect_forked_set(argc, argv, &candidates);

Due to poor separation of changes, all the necessary information to
assess how sane the above code is is hidden in [1/4] and not here.

> +	for_each_remote(collect_default_branch_name, &protected_default_names);

This looks inefficient, and worse, incorrect.

If we have multiple branches in argv[], they may have come from
different upstream, and because you pay attention to what the
refs/remotes/<upstream>/HEAD symrefs point at, you have multiple
such "protected" default names.  You'd compare each and every
candidates against these names using linear search in the
string_list to see if they are protected (inefficient), and you
reject removal of argv[1] even when it happens to be the same name
as the default in the repository from which the upstream of argv[3]
came from, i.e., has no relationship with argv[1] (incorrect).

> +	for_each_string_list_item(item, &candidates) {
> +		const char *short_name = item->string;
> +		const char *upstream = item->util;
> +
> +		strbuf_reset(&buf);
> +		strbuf_addf(&buf, "refs/heads/%s", short_name);
> +		if (branch_checked_out(buf.buf))
> +			continue;
> +
> +		if (string_list_has_string(&protected_default_names,
> +					   short_name))
> +			continue;
> +
> +		if (!refs_ref_exists(refs, upstream))
> +			continue;
> +
> +		strvec_push(&deletable, short_name);
> +	}
> +	strbuf_release(&buf);
> +
> +	if (deletable.nr)
> +		ret = delete_branches(deletable.nr, deletable.v,
> +				      0, FILTER_REFS_BRANCHES, quiet,
> +				      1, &n_not_merged);

Add comments on parameters, perhaps, to make it readable?

		ret = delete_branches(deletable.nr, deletable.v,
				      0, /* no force */
				      FILTER_REFS_BRANCHES,
				      quiet,
				      1, /* warn only */
				      &n_not_merged);

or something?

Unfortunately the body of delete_branches() updated by this series
is not visible in this patch, making a sensible review impossible,
but if I recall correctly from what I saw in [1/4], with no-force
set, it does not leave head_rev NULL, and instead reads the HEAD
into it, and that is eventually passed to check_branch_commit()
call.

Now, check_branch_commit() passes head_rev to branch_merged(), which
falls back to "HEAD" when head_rev is given.  That sounds incorrect
for at least two reasons.  We are only dealing with branches that do
have upstream, so fallback should not trigger and we shouldn't be
allowing head_rev to be passed.  It _might_ be debatable that it is
prudent to still expect branch_merged() not to find the upstream to
detect errors in the logic, but falling back to head_rev in such a
case does not make any sense.

> +	if (n_not_merged && !quiet)
> +		fprintf(stderr,
> +			Q_("Skipped %d branch that is not fully merged; "
> +			   "delete it with 'git branch -D' if you are sure.\n",
> +			   "Skipped %d branches that are not fully merged; "
> +			   "delete them with 'git branch -D' if you are sure.\n",
> +			   n_not_merged),
> +			n_not_merged);

I do not think we unconditionally want to see this, and "--quiet"
shouldn't be the onlyl way to squelch this message.

When !quiet, the warn_only call to check_branch_commit() would
already have reported which branches are not fully merged, and
after seeing this message a few times, even the most novice user
would know how to use "git branch -D" to remove unneeded branches.

Use of advice_if_enabled() may make it more palatable.

^ permalink raw reply

* Re: [PATCH v10 2/4] branch: add --prune-merged <branch>
From: Junio C Hamano @ 2026-05-22  2:53 UTC (permalink / raw)
  To: Harald Nordgren via GitGitGadget
  Cc: git, Kristoffer Haugsbakk, Johannes Sixt, Phillip Wood,
	Harald Nordgren
In-Reply-To: <xmqq8q9cw40a.fsf@gitster.g>

Junio C Hamano <gitster@pobox.com> writes:

> "Harald Nordgren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>
>
>> diff --git a/builtin/branch.c b/builtin/branch.c
>> index 1e24c95a69..29d38e9060 100644
>> --- a/builtin/branch.c
>> +++ b/builtin/branch.c
>
> Due to the way the patch is split between 1/4 and 2/4, it is
> impossible to comment on the change to delete_branches etc. that are
> needed for this step.  I'll use "git diff master... builtin/" instead.
>

Please discard this version.  I had unnecessary draft comments that
I used as reference in it.

^ permalink raw reply

* Re: [PATCH] connect: use "service" enum for "name" argument
From: Jeff King @ 2026-05-22  4:43 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Junio C Hamano, git
In-Reply-To: <ag7AJMbav6KgSCjj@pks.im>

On Thu, May 21, 2026 at 10:19:48AM +0200, Patrick Steinhardt wrote:

> > +	switch (service) {
> > +	case GIT_CONNECT_UPLOAD_PACK:
> > +		return "git-upload-pack";
> > +	case GIT_CONNECT_RECEIVE_PACK:
> > +		return "git-receive-pack";
> > +	case GIT_CONNECT_UPLOAD_ARCHIVE:
> > +		return "git-upload-archive";
> > +	}
> > +	BUG("unknown git_connect_type: %d", service);
> > +}
> 
> Shouldn't this say "unknown git_connect_service" instead of "_type"?

Oops, yes. As you probably guessed, I started with "type" before
realizing that "service" was a better word.

The patch is in next, so the fixup on top (of jk/connect-service-enum)
is below.

-- >8 --
Subject: [PATCH] transport-helper: fix typo in BUG() message

We mistakenly refer to the git_connect_service enum as "_type" rather
than "_service". Users should never see this message in practice, but it
is slightly confusing when reading the code.

Reported-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Jeff King <peff@peff.net>
---
 transport-helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/transport-helper.c b/transport-helper.c
index bf37c5280c..b672801ae4 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -630,7 +630,7 @@ static const char *connect_service_cmd(enum git_connect_service service)
 	case GIT_CONNECT_UPLOAD_ARCHIVE:
 		return "git-upload-archive";
 	}
-	BUG("unknown git_connect_type: %d", service);
+	BUG("unknown git_connect_service: %d", service);
 }
 
 static int process_connect_service(struct transport *transport,
-- 
2.54.0.618.gdbb63b8024


^ permalink raw reply related

* Re: [PATCH] http: handle absolute-path alternates from server root
From: Jeff King @ 2026-05-22  4:55 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Junio C Hamano, git, slonkazoid
In-Reply-To: <ag7xbkTF11N22waX@pks.im>

On Thu, May 21, 2026 at 01:50:06PM +0200, Patrick Steinhardt wrote:

> > Packfile URIs help with the actual pack generation (even if we're
> > blitting out bits from the disk with verbatim packfile reuse, we still
> > have to handle gaps and compute the checksum over the output pack).
> > 
> > But it doesn't help with the server computing the set of objects the
> > client needs in the first place. IIRC, packfile URIs work by the server
> > saying "oh, I was going to send you object XYZ, but you can get it from
> > this stable pack instead". So the server still has to compute the set of
> > objects (and send any that are not mentioned in URI packs). Bitmaps
> > help, but there's still non-trivial computation and storage on the
> > server.
> 
> I guess it depends on the actual server-side implementation, but in the
> general case this is of course true. A server could decide to for
> example overserve objects in case the client does a full clone, or it
> could arrange packfiles in a special way that allows it to serve at
> least some kinds of requests efficiently.

True, though you still have to receive the client wants/haves before
getting to the packfile-uri phase. The alternative is for the server
send URIs during the ref advertisement. But we have that, too, these
days: the bundle-uri feature. (Which I completely forgot about while
writing my earlier email).

So I do think that bundle-uris can probably be an adequate substitute
for dumb-http in terms of reducing server load. Though...

> Packfile URIs definitely need some love to become feasible, yes, and I
> don't think they have evolved much since their introduction. I still
> feel like they are the better mechanism for offloading traffic compared
> to bundle URIs though, as we already have packfiles around anyway.

...yeah, I agree that storing both bundles and packs can be annoying for
a server, depending on your setup. In theory it would not be hard for a
slightly-clever server endpoint to store packfiles for regular Git to
use, and then generate the bundles on the fly by cat-ing the bundle
header and the packfile, both of which can be sent out as raw bytes
without further processing.

Anyway, we are far afield from the patch that started this thread. ;) I
do agree with the general notion that we _should_ be able to get
smart-http close to the server-side expense of dumb-http with a few
tricks like these.

-Peff

^ permalink raw reply

* Re: [PATCH 4/9] run-command: add support for timeout in command finisher
From: Jeff King @ 2026-05-22  5:10 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: Siddh Raman Pant, git@vger.kernel.org, gitster@pobox.com,
	newren@gmail.com, ps@pks.im, oswald.buddenhagen@gmx.de,
	code@khaugsbakk.name
In-Reply-To: <cf52154c-1275-4a4b-957e-5aa17f22705c@kdbg.org>

On Thu, May 21, 2026 at 04:36:05PM +0200, Johannes Sixt wrote:

> Am 21.05.26 um 11:59 schrieb Siddh Raman Pant:
> > The timeout is for the failure path, where the external helper has
> > already stopped following that protocol or is blocked on something
> > outside git's control. Since git starts the helper and puts it on the
> > log/grep path, git also needs a bounded way to recover when that helper
> > does not make progress. Otherwise an optional note source can prevent
> > the main git command from completing.
> 
> That Git communicates with a process that looks like it stopped is the
> normal case, for example:
> 
> - Output is sent to the pager. The user can take their time to study the
> output. All the while, git waits patiently for the user to advance the
> pager.
> 
> - Git fetch transfers large amounts of data across the network. Most of
> the time it waits for data to arrive and does nothing. The peer process
> looks like it hangs. Git does not decide to kill the connection at any
> time. It is the user's decision to do so.
> 
> If the notes provider hangs, then it is not on Git to decide when it has
> waited long enough.

Yeah, I agree with your point of view. If I understand this patch series
correctly, it is about adding an external process to map commit ids to
note data. So I can think of some existing features that are quite close
to that in nature, none of which use timeouts:

  - textconv filters and external diffs which process data in the middle
    of a git-log invocation

  - long-lived clean/smudge filters map blobs to arbitrarily large text

  - cat-file's batch mode maps object ids to user-specified data about
    that object

As you note, it's up to the command to be well-behaved. Git should
notice and respond appropriately if the command closes the pipe, of
course. Sometimes a timeout can help with a poorly behaved command, but
IMHO it is not worth the cost of non-determinism that it brings.

Moreover, the bits touching run-command here make me nervous, especially
after the challenges we saw in the child-cleanup topic that was reverted
just after v2.54. There is often a shell interposed between Git and the
sub-command, and we don't always know how the shell will react to
signals. Using SIGKILL will eventually get us _something_ to wait() on,
but it might not even be the process we care about!

I don't really care much about this external-notes feature one way or
the other, but if we are going to do it, I don't see any reason why it
would not behave like all of the other similar parts of Git.

-Peff

^ permalink raw reply

* Re: [PATCH v3] git-jump: pick a mode automatically when invoked without arguments
From: Jeff King @ 2026-05-22  5:28 UTC (permalink / raw)
  To: Greg Hurrell via GitGitGadget
  Cc: git, Greg Hurrell, Erik Cervin Edin, Junio C Hamano, Greg Hurrell
In-Reply-To: <pull.2108.v3.git.1779371110195.gitgitgadget@gmail.com>

On Thu, May 21, 2026 at 01:45:09PM +0000, Greg Hurrell via GitGitGadget wrote:

>     Changes since v2; all of these in response to feedback from Junio:
>     
>      * Removed stray # from README.
>      * Don't both teaching "auto" to select "ws" mode, because it is always
>        subsumed by "diff".
>      * Update usage string to make clear that git jump --stdout foo is not a
>        synonym for git jump --stdout auto foo, because distinguishing
>        between foo as <mode> and foo as <arg> is fraught with ambiguity.
>     
>     In answer to Junio's question:
>     
>     > If more than one interesting cases apply, what happens, and what
>     > should happen?
>     
>     it's an ordered choice (merge > diff).

Dropping the "ws" mode from auto makes sense to me. It could be slotted
in between "merge" and "diff" (a whitespace problem always implies a
diff, but a diff does not always imply a whitespace problem). But would
that actually be useful?

My impression of the "auto" feature is: I am too lazy to type, so just
take me to the interesting bits. And interesting in my experience with
git-jump is either "I am merging, take me to the conflict" or "I am
writing new code, take me to what I already did". Limiting the second
case just to whitespace violations (assuming there is at least one)
would probably be more confusing than helpful.

You could perhaps argue for one more layer of "interesting", which is:
if there are no unstaged changes, take me to the staged ones. Looking at
staged changes can be misleading if the working tree file has moved on
(since by definition we are dumping the editor into the working tree
file, not the staged contents). But if we know the diff between index
and working tree is empty, then that is not an issue.

I do sometimes use "git jump diff --cached" explicitly for that purpose.

If sounds like Greg has been living with "auto" and finding it useful
for a while. So I'm mostly inclined to take the patch as-is, and people
can experiment with it and suggest changes after using it in practice.

But here's my one final thought. The hierarchy of "merge > diff > diff
--cached", etc, makes me wonder if we could simply concatenate the
outputs.  I.e., show all of the merge conflicts and all of the diffs,
whitespace changes, and so on. But I guess you run into two issues:

  1. There will be duplicates. Every merge conflict is also a diff. And
     every whitespace violation is also a diff. So at the very least you
     need to de-dup these (which might not be entirely trivial, as they
     may overlap rather than starting at the same line).

  2. When resolving a merge, I'm not sure if the diffs are actually
     interesting. Usually I git-jump around until there are no more
     conflicts, and then I see if it builds. But I almost never want to
     jump to the other changes introduced by the side-branch. There are
     too many of them, and they are probably uninteresting. So adding
     both "merge" and "diff" elements to the quicklist would be
     annoying.

-Peff

^ permalink raw reply

* Re: [PATCH 0/5] [RFC] diff: add diff.<driver>.process for external hunk providers
From: Junio C Hamano @ 2026-05-22  5:29 UTC (permalink / raw)
  To: Michael Montalbo via GitGitGadget; +Cc: git, Michael Montalbo
In-Reply-To: <pull.2120.git.1779415884.gitgitgadget@gmail.com>

"Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This series adds diff.<driver>.process, a long-running subprocess protocol
> that lets external tools provide hunks to git's diff and blame pipelines.
>
> Over the past 18 years, git's diff pipeline accumulated many features that
> operate on hunks: word diff, function context, color-moved, indent
> heuristic, blame. External tools can replace the pipeline entirely
> (diff.<driver>.command) or select among builtin algorithms
> (diff.<driver>.algorithm), but there is no way for a tool to provide
> line-change information into the pipeline. Tools that understand code
> structure (tree-sitter parsers, format-aware analyzers, tools like
> Difftastic and Mergiraf) must bypass git's pipeline and lose access to
> everything downstream.
>
> The protocol follows filter.<driver>.process: pkt-line over stdin/stdout,
> capability negotiation, one tool invocation per git command. The tool
> receives file pairs and returns hunk descriptors that git feeds into the
> standard xdiff pipeline. All output features work normally.
>
> Zero hunks with status=success means the tool considers the files
> equivalent. git diff shows no output for the file, and git blame skips the
> commit, attributing lines to earlier commits.
>
> On error or tool crash, git falls back silently to the builtin diff
> algorithm. The feature is opt-in via diff.<driver>.process and
> .gitattributes; unconfigured files are unaffected.
>
> The series includes git diff-process-normalize, a built-in tool that
> compares files line by line ignoring whitespace (same logic as "git diff -w"
> via xdiff_compare_lines):

Interesting.

If the goal is purely to normalize content before comparison
(e.g. stripping comments or canonicalizing formatting), we already
have the `textconv` mechanism.  While `textconv` is a "one-shot"
per-file process, it is significantly simpler.

I suspect, however, that the primary focus here is to allow external
tools to provide structural alignment (e.g. for AST- aware diffs
like Difftastic or Mergiraf) without losing the original content in
the display.  Unlike `textconv`, which transforms the text the user
sees, this protocol lets the display remain identical to the source
while using a custom engine for the line-matching logic.

If that is the intent, it should be stated more explicitly in the
documentation and commit messages.  The "whitespace-normalize"
demonstration in [PATCH 5/5] is misleading because it's exactly the
case where `textconv` would be sufficient.

I am afraid that the use of a long-running subprocess for every
diff/blame invocation adds significant complexity and overhead.  In
particular, wouldn't the `blame` implementation performs a
round-trip to the subprocess for every commit in the history?  Even
with a persistent process, the overhead of serializing and
deserializing the entire file content twice (old and new) for every
commit could be prohibitive for large files or deep histories.

So, I dunno.

^ permalink raw reply

* Re: [PATCH 1/5] xdiff: support external hunks via xpparam_t
From: Junio C Hamano @ 2026-05-22  5:29 UTC (permalink / raw)
  To: Michael Montalbo via GitGitGadget; +Cc: git, Michael Montalbo
In-Reply-To: <8c0ea0bc0742651e634db7a3002e8cbe1240acf9.1779415884.git.gitgitgadget@gmail.com>

"Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +/*
> + * Populate the changed[] arrays from externally supplied hunks,
> + * bypassing the diff algorithm.  Validates that hunks are in order,
> + * non-overlapping, and within bounds.
> + *
> + * Returns 0 on success, -1 on validation failure.
> + */
> +static int xdl_populate_hunks_from_external(xdfenv_t *xe,
> +					    const struct xdl_hunk *hunks,
> +					    size_t nr_hunks)
> +{
> +	size_t i;
> +	long j, prev_old_end = 0, prev_new_end = 0;
> +	long total_old = 0, total_new = 0;
> +
> +	/*
> +	 * Clear changed[] arrays.  xdl_prepare_env() may have dirtied
> +	 * them via xdl_cleanup_records().  The allocation is nrec + 2
> +	 * elements; changed points one past the start (see xprepare.c).
> +	 */
> +	memset(xe->xdf1.changed - 1, 0,
> +	       (xe->xdf1.nrec + 2) * sizeof(bool));
> +	memset(xe->xdf2.changed - 1, 0,
> +	       (xe->xdf2.nrec + 2) * sizeof(bool));

This, especially the starting offset of -1, looks horrible.  The
internal layout of xdfenv_t might happen to match the way the above
code expects, which is how xdl_prepare_ctx() may have give you, but
it somehow feels brittle.  I guess the assumption that changed[]
does not point at the beginning of the allocated area (e.g., it is a
no-no to free(xe->xdf1.changed) or realloc() it) is so pervasive that
it cannot be helped.  Sigh.

>  int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
>  	     xdemitconf_t const *xecfg, xdemitcb_t *ecb) {
>  	xdchange_t *xscr;
>  	xdfenv_t xe;
>  	emit_func_t ef = xecfg->hunk_func ? xdl_call_hunk_func : xdl_emit_diff;
>  
> -	if (xdl_do_diff(mf1, mf2, xpp, &xe) < 0) {
> -
> -		return -1;
> +	if (xpp->external_hunks) {
> +		if (xdl_prepare_env(mf1, mf2, xpp, &xe) < 0)
> +			return -1;
> +		if (xdl_populate_hunks_from_external(&xe,
> +						     xpp->external_hunks,
> +						     xpp->external_hunks_nr) < 0) {
> +			/*
> +			 * Invalid external hunks; fall back to the
> +			 * builtin diff algorithm.  Re-runs
> +			 * xdl_prepare_env() via xdl_do_diff().
> +			 */
> +			xdl_free_env(&xe);
> +			if (xdl_do_diff(mf1, mf2, xpp, &xe) < 0)
> +				return -1;

If the external tool keeps sending bogus hunks, silently falling
back to what we would have done if there weren't any external stuff
may be necessary to pleasantly keep using Git, but two and a half
short comments here.

 (1) "What we would have done" is exactly the same as what appears
     in the corresponding "else" block.  Can we make sure that we do
     not have to keep updating both copies in the future with some
     code rearrangement?

 (2) The writer of the external tool may want to see some trace of
     warning under certain flags when a failure of the tool forces
     the receiving end to fallback.

 (3) If the tool throws too many broken replies, perhaps we want to
     disable it automatically?

> +		}
> +	} else {
> +		if (xdl_do_diff(mf1, mf2, xpp, &xe) < 0)
> +			return -1;
>  	}
> +
>  	if (xdl_change_compact(&xe.xdf1, &xe.xdf2, xpp->flags) < 0 ||
>  	    xdl_change_compact(&xe.xdf2, &xe.xdf1, xpp->flags) < 0 ||
>  	    xdl_build_script(&xe, &xscr) < 0) {

^ permalink raw reply

* Re: [PATCH] log: let --follow follow renames in merge commits
From: Jeff King @ 2026-05-22  5:43 UTC (permalink / raw)
  To: Miklos Vajna; +Cc: Elijah Newren, git
In-Reply-To: <ag2265RJal-tJLoW@collabora.com>

On Wed, May 20, 2026 at 03:28:11PM +0200, Miklos Vajna wrote:

> Hi Elijah, Jeff,
> 
> On Tue, May 19, 2026 at 03:37:54PM +0900, Junio C Hamano <gitster@pobox.com> wrote:
> > > :-) Should I just wait more or should I resend this?
> > 
> > Rather, ask other reviewers
> 
> I did a small improvement to how 'git log --follow' works, as in if the
> rename happens inside the merge commit itself, then the rename was
> detected "vs the first parent", but it wasn't detected "vs other
> parents", which is painful with a "subtree" merge commit.
> 
> I'm not sure if it adds value, but I can append a one-paragraph summary
> of Junio's comment in this thread to the end the commit message, to be
> more explicit that the inherent limitation of the current log follow
> design (single path, once a rename is detected, we only care about the
> new path) is not changed with the patch, this is just a fix patch so
> 'git log' works better, similar to how 'git blame' already does.
> 
> May I ask you to review the patch?

I saw Junio's comment. I was about to write something very similar
before I saw that he had already done so. ;)

I think we can probably all agree that both before and after your patch,
--follow is never going to do the _right_ thing, which is to follow
paths independently down both sides of history.

I am OK conceptually with making the current broken behavior slightly
more useful if it is easy to do. But I am not sure if we are making
things more useful here or not. If we see a merge where the file "bar"
was previous "foo" on one side and "bar" on the other, our broken follow
is going to either pick "foo" or "bar" to continue with as we traverse.
But which one is right? Whichever name we choose, we are potentially
omitting results from the other side.

Right now we pick the first-parent name always. But it does not seem
more correct to me to pick one from another parent. You'd be missing
further commits using the original name along the first-parent track.

There might be a more useful rule like: if the path is untouched versus
the merge result in all parents but one (i.e., TREESAME), then choose
the parent where it was changed, including any --follow processing. But
we already do something like that for history simplification. Which
makes me wonder if you could get the results you want through some use
of history-simplification flags.

Or maybe we already do that TREESAME check. Simplification kicks in when
the traversal is limited by path, and --follow mode by definition has
such a path. But I'm not sure if the --follow code would see the
simplified parent list or not.

So I dunno. Probably some experimenting could yield more analysis there,
but with the patch as-is I'm not convinced that it is not going to make
some cases worse.

-Peff

^ permalink raw reply

* Re: [PATCH 4/9] run-command: add support for timeout in command finisher
From: Siddh Raman Pant @ 2026-05-22  5:46 UTC (permalink / raw)
  To: j6t@kdbg.org, gitster@pobox.com
  Cc: git@vger.kernel.org, newren@gmail.com, ps@pks.im,
	oswald.buddenhagen@gmx.de, code@khaugsbakk.name
In-Reply-To: <xmqqv7cgxq0o.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 907 bytes --]

On Fri, May 22 2026 at 05:40:47 +0530, Junio C Hamano wrote:
> If a protocol builds its own way to declare "this backend is slow,
> so please do not consider less than 3 seconds of nonaction something
> to worry about but kill it off if you waited more than that" to make
> the receiving/waiting end responsible for managing timeout, that
> might be workable, but it certainly feels like a kludge.  The
> protocol can instead allow an "error - for your particular request,
> we couldn't come up with an answer within a reasonable time limit"
> response (in practice, "within time limit" does not have to be the
> only reason for such an error) to be returned, I think.

I think we are confusing two different commits here.

The response read deadline is the next commit. This commit is about
force-killing a process if it doesn't respond to the initial
termination signal.

Thanks,
Siddh

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 0/8] setup: centralize object database creation
From: Patrick Steinhardt @ 2026-05-22  6:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqqzn4xp0c.fsf@gitster.g>

On Fri, May 22, 2026 at 09:32:35AM +0900, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > The series is built on top of aec3f58750 (Sync with 'maint', 2026-05-21)
> > with ps/setup-wo-the-repository at df69f40c34 (setup: stop using
> > `the_repository` in `init_db()`, 2026-05-19) merged into it.
> 
> FWIW, this merge needs the following merge-fix squashed into it,
> for the topic to build standalone.
> 
> commit ce350f62ceb26f3276ea3b7ad78b7f8cb4c35cf7
> Author: Junio C Hamano <gitster@pobox.com>
> Date:   Wed May 13 12:20:29 2026 +0900
> 
>     merge-fix/ps/setup-wo-the-repository
>     
>     with  js/objects-larger-than-4gb-on-windows
> 
> diff --git a/t/helper/test-synthesize.c b/t/helper/test-synthesize.c
> index 1f28ecf0f2..3fa534fbdf 100644
> --- a/t/helper/test-synthesize.c
> +++ b/t/helper/test-synthesize.c
> @@ -506,7 +506,7 @@ static int cmd__synthesize__pack(int argc, const char **argv,
>  		OPT_END()
>  	};
>  
> -	setup_git_directory_gently(&non_git);
> +	setup_git_directory_gently(the_repository, &non_git);
>  	repo = the_repository;
>  	algo = unsafe_hash_algo(repo->hash_algo);

Oh, right, I should have mentioned this. I do have the same fixup on top
of the merge, thanks.

Patrick

^ permalink raw reply

* Re: [PATCH 8/8] setup: construct object database in `apply_repository_format()`
From: Patrick Steinhardt @ 2026-05-22  6:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqq4ik0zls3.fsf@gitster.g>

On Fri, May 22, 2026 at 02:59:24AM +0900, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > With the preceding changes we now always construct the repository's
> > object database before applying the repository format. Remove this
> > duplication by constructing it in `apply_repository_format()` instead.
> >
> > Note that we create the object database _after_ having set up the
> > repository's hash algorithm, but _before_ setting the compat hash
> > algorithm. This is intentional:
> >
> >   - Constructing the object database may require knowledge of its
> >     intended object format.
> >
> >   - Setting up the compatibility hash requires the object database to be
> >     initialized already, because we immediately read the loose object
> >     map.
> >
> > The first point is sensible, the second maybe a little less so. Ideally,
> > it should be the responsibility of the object database itself to
> > initialize any data structures required for the compatibility hash. But
> > this would require further changes, so this is kept as-is for now.
> 
> Yeah, I guess it is a good place to stop, instead of solving the
> chicken-and-egg problem in one go.
> 
> > Further note that this requires us to move handling of the environment
> > variables GIT_OBJECT_DIRECTORY and GIT_ALTERNATE_OBJECT_DIRECTORIES into
> > the repository format, as well. This allows the caller more flexibility
> > around whether or not those environment variables are being honored, as
> > we do do want to respect them in "setup.c", but not in "repository.c".
> 
> It seems that we really really really want to do so ;-).  "do do
> want to" -> "do want to" or even "want to", perhaps.

Fixed locally, thanks! :)

Patrick

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox