Git development
 help / color / mirror / Atom feed
* [PATCH v2 1/5] merge-ort: propagate callback errors from traverse_trees_wrapper()
From: Elijah Newren via GitGitGadget @ 2026-06-14  6:37 UTC (permalink / raw)
  To: git
  Cc: Patrick Steinhardt, Christian Couder, Elijah Newren,
	Elijah Newren, Elijah Newren
In-Reply-To: <pull.2096.v2.git.1781419047.gitgitgadget@gmail.com>

From: Elijah Newren <newren@gmail.com>

traverse_trees_wrapper() saves entries from a first pass through
traverse_trees() and then replays them through the real callback
(collect_merge_info_callback).  However, the replay loop silently
discards the callback return value.  This is not a deferred error;
it is an ignored error.

Today the only originator of a negative return in this entire call
graph is traverse_trees()'s "exceeded maximum allowed tree depth"
check; everything else (collect_merge_info_callback,
traverse_trees_wrapper, the inner traverse_trees recursion) only
relays that.  So in current Git, the visible effect of dropping the
replay callback's return value is narrow but bad: a tree nested past
core.maxTreeDepth has its -1 swallowed, the subtree below the limit
is silently pruned, and the merge completes as if that were the
correct result.

A later patch in this series will teach collect_merge_info_callback()
to return -1 on an additional path -- detecting duplicate
entries in malformed trees -- which is similarly handled today by
just ignoring the problem (resulting in mostly a "last one wins" rule,
though the non-last entry can mutate various state flags).

Capture the return value, stop the loop on negative returns, and
propagate the error to the caller.  The callback returns a positive mask
value on success, so normalize non-negative returns to
0 for the caller.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 00923ce3cd..4b8e32209d 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -1008,18 +1008,20 @@ static int traverse_trees_wrapper(struct index_state *istate,
 	info->traverse_path = renames->callback_data_traverse_path;
 	info->fn = old_fn;
 	for (i = old_offset; i < renames->callback_data_nr; ++i) {
-		info->fn(n,
-			 renames->callback_data[i].mask,
-			 renames->callback_data[i].dirmask,
-			 renames->callback_data[i].names,
-			 info);
+		ret = info->fn(n,
+			       renames->callback_data[i].mask,
+			       renames->callback_data[i].dirmask,
+			       renames->callback_data[i].names,
+			       info);
+		if (ret < 0)
+			break;
 	}
 
 	renames->callback_data_nr = old_offset;
 	free(renames->callback_data_traverse_path);
 	renames->callback_data_traverse_path = old_callback_data_traverse_path;
 	info->traverse_path = NULL;
-	return 0;
+	return ret < 0 ? ret : 0;
 }
 
 static void setup_path_info(struct merge_options *opt,
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v2 2/5] merge-ort: drop unnecessary show_all_errors from collect_merge_info()
From: Elijah Newren via GitGitGadget @ 2026-06-14  6:37 UTC (permalink / raw)
  To: git
  Cc: Patrick Steinhardt, Christian Couder, Elijah Newren,
	Elijah Newren, Elijah Newren
In-Reply-To: <pull.2096.v2.git.1781419047.gitgitgadget@gmail.com>

From: Elijah Newren <newren@gmail.com>

collect_merge_info() has set info.show_all_errors = 1 since
d2bc1994f363 (merge-ort: implement a very basic collect_merge_info(),
2020-12-13).  This setting was copied from unpack-trees.c where it
controls batching of error messages for porcelain display, but
merge-ort has no such error-batching logic and never needed it.

With show_all_errors set, traverse_trees() captures a negative callback
return but continues processing remaining entries rather than stopping
immediately.  Removing the setting restores the default behavior where
a negative return from collect_merge_info_callback() breaks out of the
traversal loop right away, allowing a future commit to exit early when
a corrupt tree is detected.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index 4b8e32209d..74e9636020 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -1740,7 +1740,6 @@ static int collect_merge_info(struct merge_options *opt,
 	setup_traverse_info(&info, opt->priv->toplevel_dir);
 	info.fn = collect_merge_info_callback;
 	info.data = opt;
-	info.show_all_errors = 1;
 
 	if (repo_parse_tree(opt->repo, merge_base) < 0 ||
 	    repo_parse_tree(opt->repo, side1) < 0 ||
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v2 3/5] merge-ort: free diff pairs queue in clear_or_reinit_internal_opts()
From: Elijah Newren via GitGitGadget @ 2026-06-14  6:37 UTC (permalink / raw)
  To: git
  Cc: Patrick Steinhardt, Christian Couder, Elijah Newren,
	Elijah Newren, Elijah Newren
In-Reply-To: <pull.2096.v2.git.1781419047.gitgitgadget@gmail.com>

From: Elijah Newren <newren@gmail.com>

clear_or_reinit_internal_opts() is responsible for cleaning up the
various data structures in merge_options_internal.  It already handles
many renames-related structures (dirs_removed, dir_renames,
relevant_sources, cached_pairs, deferred, etc.) but does not free
renames->pairs[].queue.

In the normal code path, resolve_and_process_renames() frees
pairs[s].queue and reinitializes it with diff_queue_init() before
clear_or_reinit_internal_opts() runs, so the omission is harmless.
However, if collect_merge_info() encounters an error and returns early
(before resolve_and_process_renames() is ever called), any diff pairs
already queued by collect_rename_info()/add_pair() will have their
backing array leaked.

Fix this by freeing renames->pairs[].queue in the cleanup function.
In the normal path the pointer is already NULL (from the earlier
diff_queue_init() in resolve_and_process_renames()), so free(NULL) is
a safe no-op.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/merge-ort.c b/merge-ort.c
index 74e9636020..8f911cb639 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -728,6 +728,8 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
 		strintmap_clear_func(&renames->deferred[i].possible_trivial_merges);
 		strset_clear_func(&renames->deferred[i].target_dirs);
 		renames->deferred[i].trivial_merges_okay = 1; /* 1 == maybe */
+		free(renames->pairs[i].queue);
+		diff_queue_init(&renames->pairs[i]);
 	}
 	renames->cached_pairs_valid_side = 0;
 	renames->dir_rename_mask = 0;
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v2 4/5] merge-ort: abort merge when trees have duplicate entries
From: Elijah Newren via GitGitGadget @ 2026-06-14  6:37 UTC (permalink / raw)
  To: git
  Cc: Patrick Steinhardt, Christian Couder, Elijah Newren,
	Elijah Newren, Elijah Newren
In-Reply-To: <pull.2096.v2.git.1781419047.gitgitgadget@gmail.com>

From: Elijah Newren <newren@gmail.com>

Trees with duplicate entries are malformed; fsck reports "contains
duplicate file entries" for them.  merge-ort has from the beginning
assumed that we would never hit such trees.  It was written with the
assumption that traverse_trees() calls collect_merge_info_callback() at
most once per path.  The "sanity checks" in that callback (added in
d2bc1994f363 (merge-ort: implement a very basic collect_merge_info(),
2020-12-13)) verify properties of each individual call but not that
invariant.  The strmap_put() in setup_path_info() silently overwrites
the entry from any prior call for the same path, because it assumed
there would be no other path.  Unfortunately, supplemental data
structures for various optimizations could still be tweaked before the
extra paths were overwritten, and those data structures not matching
expected state could trip various assertions.

Change the return type of setup_path_info() from void to int to allow us
to detect this case, and abort the merge with a clear error message when
it occurs.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c                          | 61 ++++++++++++++++------------
 t/t6422-merge-rename-corner-cases.sh | 54 ++++++++++++++++++++++++
 2 files changed, 88 insertions(+), 27 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 8f911cb639..be0829bbb7 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -1026,18 +1026,18 @@ static int traverse_trees_wrapper(struct index_state *istate,
 	return ret < 0 ? ret : 0;
 }
 
-static void setup_path_info(struct merge_options *opt,
-			    struct string_list_item *result,
-			    const char *current_dir_name,
-			    int current_dir_name_len,
-			    char *fullpath, /* we'll take over ownership */
-			    struct name_entry *names,
-			    struct name_entry *merged_version,
-			    unsigned is_null,     /* boolean */
-			    unsigned df_conflict, /* boolean */
-			    unsigned filemask,
-			    unsigned dirmask,
-			    int resolved          /* boolean */)
+static int setup_path_info(struct merge_options *opt,
+			   struct string_list_item *result,
+			   const char *current_dir_name,
+			   int current_dir_name_len,
+			   char *fullpath, /* we'll take over ownership */
+			   struct name_entry *names,
+			   struct name_entry *merged_version,
+			   unsigned is_null,     /* boolean */
+			   unsigned df_conflict, /* boolean */
+			   unsigned filemask,
+			   unsigned dirmask,
+			   int resolved          /* boolean */)
 {
 	/* result->util is void*, so mi is a convenience typed variable */
 	struct merged_info *mi;
@@ -1081,9 +1081,11 @@ static void setup_path_info(struct merge_options *opt,
 			 */
 			mi->is_null = 1;
 	}
-	strmap_put(&opt->priv->paths, fullpath, mi);
+	if (strmap_put(&opt->priv->paths, fullpath, mi))
+		return error(_("tree has duplicate entries for '%s'"), fullpath);
 	result->string = fullpath;
 	result->util = mi;
+	return 0;
 }
 
 static void add_pair(struct merge_options *opt,
@@ -1350,9 +1352,10 @@ static int collect_merge_info_callback(int n,
 	 */
 	if (side1_matches_mbase && side2_matches_mbase) {
 		/* mbase, side1, & side2 all match; use mbase as resolution */
-		setup_path_info(opt, &pi, dirname, info->pathlen, fullpath,
-				names, names+0, mbase_null, 0 /* df_conflict */,
-				filemask, dirmask, 1 /* resolved */);
+		if (setup_path_info(opt, &pi, dirname, info->pathlen, fullpath,
+				    names, names+0, mbase_null, 0 /* df_conflict */,
+				    filemask, dirmask, 1 /* resolved */))
+			return -1; /* Quit traversing */
 		return mask;
 	}
 
@@ -1364,9 +1367,10 @@ static int collect_merge_info_callback(int n,
 	 */
 	if (sides_match && filemask == 0x07) {
 		/* use side1 (== side2) version as resolution */
-		setup_path_info(opt, &pi, dirname, info->pathlen, fullpath,
-				names, names+1, side1_null, 0,
-				filemask, dirmask, 1);
+		if (setup_path_info(opt, &pi, dirname, info->pathlen, fullpath,
+				    names, names+1, side1_null, 0,
+				    filemask, dirmask, 1))
+			return -1; /* Quit traversing */
 		return mask;
 	}
 
@@ -1378,18 +1382,20 @@ static int collect_merge_info_callback(int n,
 	 */
 	if (side1_matches_mbase && filemask == 0x07) {
 		/* use side2 version as resolution */
-		setup_path_info(opt, &pi, dirname, info->pathlen, fullpath,
-				names, names+2, side2_null, 0,
-				filemask, dirmask, 1);
+		if (setup_path_info(opt, &pi, dirname, info->pathlen, fullpath,
+				    names, names+2, side2_null, 0,
+				    filemask, dirmask, 1))
+			return -1; /* Quit traversing */
 		return mask;
 	}
 
 	/* Similar to above but swapping sides 1 and 2 */
 	if (side2_matches_mbase && filemask == 0x07) {
 		/* use side1 version as resolution */
-		setup_path_info(opt, &pi, dirname, info->pathlen, fullpath,
-				names, names+1, side1_null, 0,
-				filemask, dirmask, 1);
+		if (setup_path_info(opt, &pi, dirname, info->pathlen, fullpath,
+				    names, names+1, side1_null, 0,
+				    filemask, dirmask, 1))
+			return -1; /* Quit traversing */
 		return mask;
 	}
 
@@ -1413,8 +1419,9 @@ static int collect_merge_info_callback(int n,
 	 * unconflict some more cases, but that comes later so all we can
 	 * do now is record the different non-null file hashes.)
 	 */
-	setup_path_info(opt, &pi, dirname, info->pathlen, fullpath,
-			names, NULL, 0, df_conflict, filemask, dirmask, 0);
+	if (setup_path_info(opt, &pi, dirname, info->pathlen, fullpath,
+			    names, NULL, 0, df_conflict, filemask, dirmask, 0))
+		return -1; /* Quit traversing */
 
 	ci = pi.util;
 	VERIFY_CI(ci);
diff --git a/t/t6422-merge-rename-corner-cases.sh b/t/t6422-merge-rename-corner-cases.sh
index e18d5a227d..81b645bb3b 100755
--- a/t/t6422-merge-rename-corner-cases.sh
+++ b/t/t6422-merge-rename-corner-cases.sh
@@ -1525,4 +1525,58 @@ test_expect_success 'submodule/directory preliminary conflict' '
 	)
 '
 
+# Testcase: submodule/directory conflict with duplicate tree entries
+#   One side has a path as a gitlink (submodule).  The other side replaces
+#   the gitlink with a directory.  A third-party tool creates a tree on the
+#   submodule side that has *both* a gitlink and a tree entry for the same
+#   path (adding a file inside the submodule path ignoring that there's a
+#   gitlink there).  collect_merge_info_callback() should detect the
+#   duplicate and abort rather than silently corrupting its bookkeeping.
+
+test_expect_success 'duplicate tree entries trigger an error' '
+	test_when_finished "rm -rf duplicate-entry" &&
+	git init duplicate-entry &&
+	(
+		cd duplicate-entry &&
+
+		# Base commit: "docs" is a gitlink (submodule)
+		empty_tree=$(git mktree </dev/null) &&
+		fake_commit=$(git commit-tree $empty_tree </dev/null) &&
+		git update-index --add --cacheinfo 160000,$fake_commit,docs &&
+		echo base >file.txt &&
+		git add file.txt &&
+		git commit -m base &&
+
+		# side1: remove the gitlink, replace with a directory
+		git checkout -b side1 &&
+		git rm --cached docs &&
+		mkdir -p docs &&
+		echo hello >docs/requirements.txt &&
+		git add docs/requirements.txt &&
+		git commit -m "side1: submodule to directory" &&
+
+		# side2: keep the gitlink but craft a tree that also
+		# contains a tree entry for "docs" (simulating a tool
+		# that adds files inside a submodule path without
+		# removing the gitlink first).
+		git checkout main &&
+		git checkout -b side2 &&
+		blob_oid=$(echo world | git hash-object -w --stdin) &&
+		docs_tree=$(printf "100644 blob %s\trequirements.txt\n" \
+			"$blob_oid" | git mktree) &&
+		cur_tree=$(git rev-parse HEAD^{tree}) &&
+		git cat-file -p $cur_tree >tree-listing &&
+		printf "040000 tree %s\tdocs\n" "$docs_tree" >>tree-listing &&
+		new_tree=$(git mktree <tree-listing) &&
+		side2_commit=$(git commit-tree $new_tree -p HEAD \
+			-m "side2: add file alongside submodule") &&
+		git update-ref refs/heads/side2 $side2_commit &&
+
+		# Merging must detect the duplicate and abort
+		git checkout side1 &&
+		test_must_fail git merge side2 2>err &&
+		test_grep "duplicate entries" err
+	)
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v2 5/5] cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts
From: Elijah Newren via GitGitGadget @ 2026-06-14  6:37 UTC (permalink / raw)
  To: git
  Cc: Patrick Steinhardt, Christian Couder, Elijah Newren,
	Elijah Newren, Elijah Newren
In-Reply-To: <pull.2096.v2.git.1781419047.gitgitgadget@gmail.com>

From: Elijah Newren <newren@gmail.com>

verify_cache() checks that the index does not contain both "path" and
"path/file" before writing a tree.  It does this by comparing only
adjacent entries, relying on the assumption that "path/file" would
immediately follow "path" in sorted order.  Unfortunately, this
assumption does not always hold.  For example:

    docs                     <-- submodule entry
    docs-internal/README.md  <-- intervening entry
    docs/requirements.txt    <-- D/F conflict, NOT adjacent to "docs"

When this happens, verify_cache() silently misses the D/F conflict and
write-tree produces a corrupt tree object containing duplicate entries
(one for the submodule "docs" and one for the tree "docs").

I could not find any caller in current git that both allows the index to
get into this state and then tries to write it out without doing other
checks beyond the verify_cache() call in cache_tree_update(), but
verify_cache() is documented as a safety net for preventing corrupt
trees and should actually provide that guarantee.  A downstream consumer
that relied solely on cache_tree_update()'s internal checking via
verify_cache() to prevent duplicate tree entries was bitten by the gap.

Add a test that constructs a corrupt index directly (bypassing the D/F
checks in add_index_entry) and verifies that write-tree now rejects it.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 cache-tree.c                   | 64 ++++++++++++++++++++++++++++------
 t/meson.build                  |  1 +
 t/t0093-direct-index-write.pl  | 38 ++++++++++++++++++++
 t/t0093-verify-cache-df-gap.sh | 59 +++++++++++++++++++++++++++++++
 4 files changed, 151 insertions(+), 11 deletions(-)
 create mode 100644 t/t0093-direct-index-write.pl
 create mode 100755 t/t0093-verify-cache-df-gap.sh

diff --git a/cache-tree.c b/cache-tree.c
index 7881b42aa2..4d2669b312 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -161,6 +161,54 @@ void cache_tree_invalidate_path(struct index_state *istate, const char *path)
 		istate->cache_changed |= CACHE_TREE_CHANGED;
 }
 
+/*
+ * Check whether this_ce and the next entry in the index form a D/F
+ * conflict ("path" vs "path/file").  Returns the conflicting "path/..."
+ * name when one is found, or NULL otherwise.
+ *
+ * The cache is sorted, so "path/file" sorts after "path" and the
+ * conflict is usually visible as adjacent entries.  But other entries
+ * can sort between them -- e.g. "path-internal" sits between "path"
+ * and "path/file" because '-' (0x2D) precedes '/' (0x2F) -- so when
+ * the immediately following entry shares our prefix but starts with a
+ * character that sorts before '/', binary search for "path/" instead.
+ */
+static const char *find_df_conflict(struct index_state *istate,
+				    const struct cache_entry *this_ce,
+				    const struct cache_entry *next_ce)
+{
+	const char *this_name = this_ce->name;
+	const char *next_name = next_ce->name;
+	int this_len = ce_namelen(this_ce);
+	const struct cache_entry *other;
+	struct strbuf probe = STRBUF_INIT;
+	int pos;
+
+	if (this_len >= ce_namelen(next_ce) ||
+	    next_name[this_len] > '/' ||
+	    strncmp(this_name, next_name, this_len))
+		return NULL;
+
+	if (next_name[this_len] == '/')
+		return next_name;
+
+	strbuf_add(&probe, this_name, this_len);
+	strbuf_addch(&probe, '/');
+	pos = index_name_pos_sparse(istate, probe.buf, probe.len);
+	strbuf_release(&probe);
+
+	if (pos < 0)
+		pos = -pos - 1;
+	if (pos >= (int)istate->cache_nr)
+		return NULL;
+	other = istate->cache[pos];
+	if (ce_namelen(other) > this_len &&
+	    other->name[this_len] == '/' &&
+	    !strncmp(this_name, other->name, this_len))
+		return other->name;
+	return NULL;
+}
+
 static int verify_cache(struct index_state *istate, int flags)
 {
 	unsigned i, funny;
@@ -190,24 +238,18 @@ static int verify_cache(struct index_state *istate, int flags)
 	 */
 	funny = 0;
 	for (i = 0; i + 1 < istate->cache_nr; i++) {
-		/* path/file always comes after path because of the way
-		 * the cache is sorted.  Also path can appear only once,
-		 * which means conflicting one would immediately follow.
-		 */
 		const struct cache_entry *this_ce = istate->cache[i];
 		const struct cache_entry *next_ce = istate->cache[i + 1];
-		const char *this_name = this_ce->name;
-		const char *next_name = next_ce->name;
-		int this_len = ce_namelen(this_ce);
-		if (this_len < ce_namelen(next_ce) &&
-		    next_name[this_len] == '/' &&
-		    strncmp(this_name, next_name, this_len) == 0) {
+		const char *conflict_name;
+
+		conflict_name = find_df_conflict(istate, this_ce, next_ce);
+		if (conflict_name) {
 			if (10 < ++funny) {
 				fprintf(stderr, "...\n");
 				break;
 			}
 			fprintf(stderr, "You have both %s and %s\n",
-				this_name, next_name);
+				this_ce->name, conflict_name);
 		}
 	}
 	if (funny)
diff --git a/t/meson.build b/t/meson.build
index 7528e5cda5..362177999b 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -124,6 +124,7 @@ integration_tests = [
   't0090-cache-tree.sh',
   't0091-bugreport.sh',
   't0092-diagnose.sh',
+  't0093-verify-cache-df-gap.sh',
   't0095-bloom.sh',
   't0100-previous.sh',
   't0101-at-syntax.sh',
diff --git a/t/t0093-direct-index-write.pl b/t/t0093-direct-index-write.pl
new file mode 100644
index 0000000000..2881a3ebb2
--- /dev/null
+++ b/t/t0093-direct-index-write.pl
@@ -0,0 +1,38 @@
+#!/usr/bin/perl
+#
+# Build a v2 index file from entries listed on stdin.
+# Each line: "octalmode hex-oid name"
+# Output: binary index written to stdout.
+#
+# This bypasses all D/F safety checks in add_index_entry(), simulating
+# what happens when code uses ADD_CACHE_JUST_APPEND to bulk-load entries.
+use strict;
+use warnings;
+use Digest::SHA qw(sha1 sha256);
+
+my $hash_algo = $ENV{'GIT_DEFAULT_HASH'} || 'sha1';
+my $hash_func = $hash_algo eq 'sha256' ? \&sha256 : \&sha1;
+
+my @entries;
+while (my $line = <STDIN>) {
+	chomp $line;
+	my ($mode, $oid_hex, $name) = split(/ /, $line, 3);
+	push @entries, [$mode, $oid_hex, $name];
+}
+
+my $body = "DIRC" . pack("NN", 2, scalar @entries);
+
+for my $ent (@entries) {
+	my ($mode, $oid_hex, $name) = @{$ent};
+	# 10 x 32-bit stat fields (zeroed), with mode in position 7
+	my $stat = pack("N10", 0, 0, 0, 0, 0, 0, oct($mode), 0, 0, 0);
+	my $oid = pack("H*", $oid_hex);
+	my $flags = pack("n", length($name) & 0xFFF);
+	my $entry = $stat . $oid . $flags . $name . "\0";
+	# Pad to 8-byte boundary
+	while (length($entry) % 8) { $entry .= "\0"; }
+	$body .= $entry;
+}
+
+binmode STDOUT;
+print $body . $hash_func->($body);
diff --git a/t/t0093-verify-cache-df-gap.sh b/t/t0093-verify-cache-df-gap.sh
new file mode 100755
index 0000000000..0b6829d805
--- /dev/null
+++ b/t/t0093-verify-cache-df-gap.sh
@@ -0,0 +1,59 @@
+#!/bin/sh
+
+test_description='verify_cache() must catch non-adjacent D/F conflicts
+
+Ensure that verify_cache() can complain about bad entries like:
+
+  docs               <-- submodule
+  docs-internal/...  <-- sorts here because "-" < "/"
+  docs/...           <-- D/F conflict with "docs" above, not adjacent
+
+In order to test verify_cache, we directly construct a corrupt index
+(bypassing the D/F safety checks in add_index_entry) and verify that
+write-tree rejects it.
+'
+
+. ./test-lib.sh
+
+if ! test_have_prereq PERL
+then
+	skip_all='skipping verify_cache D/F tests; Perl not available'
+	test_done
+fi
+
+# Build a v2 index from entries on stdin, bypassing D/F checks.
+# Each line: "octalmode hex-oid name" (entries must be pre-sorted).
+build_corrupt_index () {
+	perl "$TEST_DIRECTORY/t0093-direct-index-write.pl" >"$1"
+}
+
+test_expect_success 'setup objects' '
+	test_commit base &&
+	BLOB=$(git rev-parse HEAD:base.t) &&
+	SUB_COMMIT=$(git rev-parse HEAD)
+'
+
+test_expect_success 'adjacent D/F conflict is caught by verify_cache' '
+	cat >index-entries <<-EOF &&
+	0160000 $SUB_COMMIT docs
+	0100644 $BLOB docs/requirements.txt
+	EOF
+	build_corrupt_index .git/index <index-entries &&
+
+	test_must_fail git write-tree 2>err &&
+	test_grep "You have both docs and docs/requirements.txt" err
+'
+
+test_expect_success 'non-adjacent D/F conflict is caught by verify_cache' '
+	cat >index-entries <<-EOF &&
+	0160000 $SUB_COMMIT docs
+	0100644 $BLOB docs-internal/README.md
+	0100644 $BLOB docs/requirements.txt
+	EOF
+	build_corrupt_index .git/index <index-entries &&
+
+	test_must_fail git write-tree 2>err &&
+	test_grep "You have both docs and docs/requirements.txt" err
+'
+
+test_done
-- 
gitgitgadget

^ permalink raw reply related

* Re: [PATCH v2 3/3] replay: offer an option to linearize the commit topology
From: Elijah Newren @ 2026-06-14  6:56 UTC (permalink / raw)
  To: Toon Claes; +Cc: git, Johannes Schindelin
In-Reply-To: <20260610-toon-git-replay-drop-merges-v2-3-5714a71c6d83@iotcl.com>

Hi,

On Wed, Jun 10, 2026 at 7:51 AM Toon Claes <toon@iotcl.com> wrote:
>
> From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
>
> One of the stated goals of git-replay(1) is to allow implementing the
> git-rebase(1) functionality on the server side.
>
> The default mode of git-rebase(1) is to act as if `--no-rebase-merges`
> was given. This mode drops merge commits instead of replaying them, and
> linearizes the commit history into a sequence of the
> regular (single-parent) commits.
>
> Add option `--linearize` to git-replay(1) to do the same.

I think this version is nicer overall than the one from my
replay-upstream branch; sorry for repeatedly getting distracted from
that, but this does look nice.

A few small comments:

> Co-authored-by: Toon Claes <toon@iotcl.com>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> Signed-off-by: Toon Claes <toon@iotcl.com>
> ---
>  Documentation/git-replay.adoc |  5 +++++
>  builtin/replay.c              |  4 ++++
>  replay.c                      | 30 +++++++++++++++++++++++-------
>  replay.h                      |  5 +++++
>  t/t3650-replay-basics.sh      | 26 ++++++++++++++++++++++++++
>  5 files changed, 63 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/git-replay.adoc b/Documentation/git-replay.adoc
> index a32f72aead..41c96c7061 100644
> --- a/Documentation/git-replay.adoc
> +++ b/Documentation/git-replay.adoc
> @@ -88,6 +88,11 @@ incompatible with `--contained` (which is a modifier for `--onto` only).
>  +
>  The default mode can be configured via the `replay.refAction` configuration variable.
>
> +--linearize::
> +       In this mode, `git replay` imitates `git rebase --no-rebase-merges`,
> +       i.e. it cherry-picks only non-merge commits, each one on top of the
> +       previous one.

The SYNOPSIS block at the top of the file is missing this new flag.

The replay_usage[] variable in cmd_replay is also missing this new flag.

>  <revision-range>::
>         Range of commits to replay; see "Specifying Ranges" in
>         linkgit:git-rev-parse[1]. In `--advance=<branch>` or
> diff --git a/builtin/replay.c b/builtin/replay.c
> index 39e3a86f6c..fedfe46dc6 100644
> --- a/builtin/replay.c
> +++ b/builtin/replay.c
> @@ -111,6 +111,8 @@ int cmd_replay(int argc,
>                              N_("mode"),
>                              N_("control ref update behavior (update|print)"),
>                              PARSE_OPT_NONEG),
> +               OPT_BOOL(0, "linearize", &opts.linearize,
> +                        N_("ignore merge commits instead of replaying them")),

"ignore" feels a bit ambiguous to me.  Can we use "drop" instead,
matching your commit message?

>                 OPT_END()
>         };
>
> @@ -132,6 +134,8 @@ int cmd_replay(int argc,
>                                   opts.contained, "--contained");
>         die_for_incompatible_opt2(!!opts.ref, "--ref",
>                                   !!opts.contained, "--contained");
> +       die_for_incompatible_opt2(!!opts.revert, "--revert",
> +                                 opts.linearize, "--linearize");

Sensible; should the docs mention this incompatibility?  (I'm not sure
myself; just throwing it out as food for thought.)

>
>         /* Parse ref action mode from command line or config */
>         ref_mode = get_ref_action_mode(repo, ref_action);
> diff --git a/replay.c b/replay.c
> index 7921d7dba3..81033fb889 100644
> --- a/replay.c
> +++ b/replay.c
> @@ -277,12 +277,16 @@ static struct commit *pick_regular_commit(struct repository *repo,
>                                           struct commit *onto,
>                                           struct merge_options *merge_opt,
>                                           struct merge_result *result,
> +                                         struct commit *replayed_base,
>                                           bool reverse,
>                                           enum replay_empty_commit_action empty)
>  {
> -       struct commit *base, *replayed_base;
> +       struct commit *base;
>         struct tree *pickme_tree, *base_tree, *replayed_base_tree;
>
> +       if (replayed_base && reverse)
> +               BUG("Linearizing commits is not supported when replaying in reverse");
> +

This is dead code given the die_for_incompatible_opt2 check above,
right?  Just extra defense in depth?

>         if (pickme->parents) {
>                 base = pickme->parents->item;
>                 base_tree = repo_get_commit_tree(repo, base);
> @@ -291,7 +295,8 @@ static struct commit *pick_regular_commit(struct repository *repo,
>                 base_tree = lookup_tree(repo, repo->hash_algo->empty_tree);
>         }
>
> -       replayed_base = get_mapped_commit(replayed_commits, base, onto);
> +       if (!replayed_base)
> +               replayed_base = get_mapped_commit(replayed_commits, base, onto);
>         replayed_base_tree = repo_get_commit_tree(repo, replayed_base);
>         pickme_tree = repo_get_commit_tree(repo, pickme);
>
> @@ -430,12 +435,23 @@ int replay_revisions(struct rev_info *revs,
>         while ((commit = get_revision(revs))) {
>                 const struct name_decoration *decoration;
>
> -               if (commit->parents && commit->parents->next)
> -                       die(_("replaying merge commits is not supported yet!"));
> +               if (commit->parents && commit->parents->next) {
> +                       if (!opts->linearize)
> +                               die(_("replaying merge commits is not supported yet!"));
> +                       /*
> +                        * When linearizing, a merge commit itself is not picked,
> +                        * but refs that point to it might need updating.
> +                        */

Is it worth pointing out that last_commit is intentionally not updated
by this code path?  That is implied by your comment, but it takes a
bit of reasoning to get there, and I think it might help future
readers to just explicitly state it.

> +               } else {
> +                       struct commit *to_pick = reverse ? last_commit : onto;
> +                       last_commit =
> +                               pick_regular_commit(revs->repo, commit,
> +                                                   replayed_commits, to_pick,
> +                                                   &merge_opt, &result,
> +                                                   opts->linearize ? last_commit : NULL,
> +                                                   reverse, opts->empty);
> +               }
>
> -               last_commit = pick_regular_commit(revs->repo, commit, replayed_commits,
> -                                                 reverse ? last_commit : onto,
> -                                                 &merge_opt, &result, reverse, opts->empty);
>                 if (!last_commit)
>                         break;
>
> diff --git a/replay.h b/replay.h
> index 1851a07705..07e6fdcca3 100644
> --- a/replay.h
> +++ b/replay.h
> @@ -62,6 +62,11 @@ struct replay_revisions_options {
>          * Defaults to REPLAY_EMPTY_COMMIT_DROP.
>          */
>         enum replay_empty_commit_action empty;
> +
> +       /*
> +        * Whether to linearize the commits (i.e. drop merge commits).
> +        */
> +       int linearize;
>  };
>
>  /* This struct is used as an out-parameter by `replay_revisions()`. */
> diff --git a/t/t3650-replay-basics.sh b/t/t3650-replay-basics.sh
> index 3353bc4a4d..64e0731188 100755
> --- a/t/t3650-replay-basics.sh
> +++ b/t/t3650-replay-basics.sh
> @@ -565,4 +565,30 @@ test_expect_success '--onto with --ref rejects multiple revision ranges' '
>         test_grep "cannot be used with multiple revision ranges" err
>  '
>
> +test_expect_success 'replay merge commit fails' '
> +       echo "fatal: replaying merge commits is not supported yet!" >expect &&
> +       test_must_fail git replay --ref-action=print --onto main I..P 2>actual &&
> +       test_cmp expect actual
> +'
> +
> +test_expect_success 'replay to rebase merge commit with --linearize' '
> +       git replay --ref-action=print --linearize --onto main I..topic-with-merge >result &&
> +
> +       test_line_count = 1 result &&
> +
> +       git log --format=%s $(cut -f 3 -d " " result) >actual &&
> +       test_write_lines O N J M L B A >expect &&
> +       test_cmp expect actual
> +'
> +
> +test_expect_success 'replay to rebase merge commit with --linearize down to root commit' '
> +       git replay --ref-action=print --linearize --onto main A..topic-with-merge >result &&

You'd need to drop "A.." to have it go down to the root commit, as
Junio mentioned elsewhere.

> +
> +       test_line_count = 1 result &&
> +
> +       git log --format=%s $(cut -f 3 -d " " result) >actual &&
> +       test_write_lines O N J I M L B A >expect &&
> +       test_cmp expect actual
> +'
> +
>  test_done

Should there also be a testcase combining --linearize and --advance?

Should there be a test with the incompatibility of --revert &
--linearize?  I think we have a few other tests for incompatible
options.

One additional testing idea, borrowed from an older variant of
this patch I had sitting in a local branch (dscho's original
linearize patch, adapted): in addition to checking specific commit
subjects, it's worth verifying that the linearized chain produces
the *same patches* as the original.  Something along the lines of:

        test_expect_success '--linearize preserves patches' '
                test_when_finished "git update-ref -d refs/heads/merge_I_L" &&
                test_tick &&
                git checkout -b merge_I_L I &&
                git merge --no-edit L &&

                git replay --linearize --onto A B..merge_I_L &&

                # range-diff ignores merges, so the original
                # {I, L, merge} reduces to {I, L} on the LHS,
                # and the replayed chain on the RHS should match.
                git range-diff B..merge_I_L@{1} B..merge_I_L >out &&
                ! test_grep -v "=" out &&

                git log --oneline A..merge_I_L >out &&
                test_line_count = 2 out
        '

The range-diff check is nice because it asserts patch equivalence
rather than tying the test to a particular replay ordering, which
makes the test less brittle if the rev-walk order ever changes.
Feel free to take, adapt, or ignore.

Anyway, thanks for working on this; looking good.

Elijah

^ permalink raw reply

* [PATCH] commit-graph: use timestamp_t for max parent generation accumulator
From: Elijah Newren via GitGitGadget @ 2026-06-14  6:57 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

compute_reachable_generation_numbers() computes each commit's
generation as

    max(c->date, max(parent.generation)) + 1

by walking its parents and accumulating their generations into a
local

    uint32_t max_gen = 0;

while info->get_generation() returns timestamp_t and
compute_generation_from_max() already takes its max_gen parameter
as timestamp_t.  For v1 (topological levels) the narrowing is
harmless because GENERATION_NUMBER_V1_MAX is less than 2^30, but
for v2 (corrected committer dates) it silently truncates any
parent generation that does not fit in 32 bits, i.e. any parent
whose committer timestamp is at or beyond 2106-02-07 UTC
(>= 2^32).

The truncated max then causes child commits to end up with a
corrected committer date that matches the parent's instead of being
at least 1 higher.  The bad value gets written into the commit-graph
and causes problems later, and can be noticed by running `git
commit-graph verify`.

Widen the accumulator to timestamp_t.

This is solely an in-memory arithmetic fix with no on-disk format
change: the on-disk format already encodes timestamp_t values and
existing readers handle them unchanged.  This merely allows the code to
compute the correct value to write to disk.

The narrowing was introduced in 80c928d947c2 (commit-graph:
simplify compute_generation_numbers(), 2023-03-20), which rewired
v2 to use the shared compute_reachable_generation_numbers()
helper; the helper's local accumulator had been declared uint32_t
in the immediately preceding 368d19b0b7fa (commit-graph: refactor
compute_topological_levels(), 2023-03-20) when only v1 was using
it, where it was harmless.

Add a new test with a future-dated parent and a present-day child;
without the above fix, `git commit-graph verify` reports the
descendant's stored generation as below parent + 1.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
    commit-graph: use timestamp_t for max parent generation accumulator
    
    We found a few repositories in the wild with commits whose authors were
    apparently on a computer in the year 2120 when they recorded their
    commits. Apparently, in a century from now, some folks are going to have
    a really weird timezone as well (-13068837), though the timezone doesn't
    factor into this patch at all.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2148%2Fnewren%2Fcommit-graph-fix-ccd-uint32-truncation-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2148/newren/commit-graph-fix-ccd-uint32-truncation-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/2148

 commit-graph.c                     | 2 +-
 t/t5328-commit-graph-64bit-time.sh | 9 +++++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 9abe62bd5a..4b7156fd76 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1669,7 +1669,7 @@ static void compute_reachable_generation_numbers(
 			struct commit *current = list->item;
 			struct commit_list *parent;
 			int all_parents_computed = 1;
-			uint32_t max_gen = 0;
+			timestamp_t max_gen = 0;
 
 			for (parent = current->parents; parent; parent = parent->next) {
 				repo_parse_commit(info->r, parent->item);
diff --git a/t/t5328-commit-graph-64bit-time.sh b/t/t5328-commit-graph-64bit-time.sh
index d8891e6a92..bc651b69de 100755
--- a/t/t5328-commit-graph-64bit-time.sh
+++ b/t/t5328-commit-graph-64bit-time.sh
@@ -74,6 +74,15 @@ test_expect_success 'single commit with generation data exceeding UINT32_MAX' '
 	git -C repo-uint32-max commit-graph verify
 '
 
+test_expect_success 'descendant of commit with date exceeding UINT32_MAX' '
+	git init repo-uint32-max-descendant &&
+	test_commit -C repo-uint32-max-descendant \
+		--date "@4294967300 +0000" future-parent &&
+	test_commit -C repo-uint32-max-descendant present-day-child &&
+	git -C repo-uint32-max-descendant commit-graph write --reachable &&
+	git -C repo-uint32-max-descendant commit-graph verify
+'
+
 test_expect_success PERL_TEST_HELPERS 'reader notices out-of-bounds generation overflow' '
 	graph=.git/objects/info/commit-graph &&
 	test_when_finished "rm -rf $graph" &&

base-commit: 600fe743028cbfb640855f659e9851522214bc0b
-- 
gitgitgadget

^ permalink raw reply related

* Re: [PATCH] clone: accept DEPTH env var as fallback for --depth
From: Hadrien Loge @ 2026-06-14  8:13 UTC (permalink / raw)
  To: sandals; +Cc: git, gitgitgadget, gitster, hadean-eon-dev, Hadrien Loge, m

> So say someone has this set in their environment and then they run a
script that clones a repository and runs `git describe`.  That no longer
works and the script fails because it assumed that it had history.

That is insightful but that I fail to see how that makes `git` less capable.
As Junio pointed there already other ENV vars that do _less_ useful stuff

As to breakage, this is mostly for packaging in clean chroots.
If a "history" feature is asked, obviously you skip setting this.

And yes the packages need network to build because they
are mostly user generated. For example applying patches
to other repos, before building tar archives.

I also fail to see how that would affect existing flows at all?

> because shallow fetches are _extremely_ expensive to serve

That seems like an unrelated issue to my patch yet is also interesting
as to why that would be the case. I might actually want to dig into.
Because to me, I was saving electrons, at least on my end.
And saving time to other packagers.

Thanks for the thoughtful reply. I hope this patch can still be considered
Hade

^ permalink raw reply

* Re: [RFC] commit-reach: terminate merge-base walk when one paint side is exhausted
From: Kristofer Karlsson @ 2026-06-14 11:47 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git, Derrick Stolee
In-Reply-To: <CABPp-BGq8a-3ocJ+1HCgJutw1SBUvFg6YxtUamryfgEMx3qDYQ@mail.gmail.com>

On Sun, 14 Jun 2026 at 06:32, Elijah Newren <newren@gmail.com> wrote:
> Wow...it appears this optimization was discovered by 3 separate
> people in the last month. This optimization was implemented and
> is live at GitHub...but it feels incomplete to me because my
> version doesn't handle both sides having an infinite generation
> number (it just falls back to the old algorithm when that
> happens).

It is nice to know that multiple people converged on the same idea
independently -- that gives me a lot of confidence that we are
exploiting a real and useful property of the graph/walk structure.

> I uploaded my version at
> https://github.com/gitgitgadget/git/pull/2150.

I uploaded mine at
https://github.com/gitgitgadget/git/pull/2149 (draft, still
iterating on the series).

I realize this is getting into implementation details before
the idea itself has been discussed on the list. I am happy to wait
with a formal patch submission until there is more consensus on the
approach -- but since you shared your implementation, I wanted to
compare notes while it is fresh.

I integrated your new t6600 test cases into my branch -- thanks!
They exercise important edge cases that my original tests missed.
I also extended the perf test to cover the case where both tips
are outside the commit-graph.

After looking at your implementation, I also moved from a
max-pointer scheme to per-side counters. We ended up with slightly
different implementations, but they are tracking the same underlying
condition: whether either paint side still has non-stale exclusive
commits remaining.

The main behavioral difference is handling of commits outside the
commit-graph. Your version disables the optimization once both
sides have touched such commits, while mine only enables the break
after the walk reaches the finite-generation region. This still
allows the optimization to fire when both tips start outside the
graph, as soon as the walk crosses into commits covered by the
commit-graph.

> Do you want to take this over, rebase it, and extend to the
> infinite generation number case? Or do you want me to rebase
> and see it through after my vacation? Or some other mixture?

I am happy to keep working on this -- starting from either your
branch or mine, or some hybrid. I do not have a strong preference
for which version it would be based on, but if either of them lands
I would be happy. Enjoy your vacation!

Thanks,
Kristofer

^ permalink raw reply

* [PATCH] builtin/history: unuse the commit buffer after use
From: Kaartic Sivaraam @ 2026-06-14 14:15 UTC (permalink / raw)
  To: Git mailing list; +Cc: Patrick Steinhardt

While running `git history reword` using a Git built with `SANITIZE` flag set
to `address,leak`, we could observe the following leak being reported:

-- 8< --

=================================================================
==7156==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1813 byte(s) in 1 object(s) allocated from:
    #0 0x79a3d1d2b60f in malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:67
    #1 0x612e2bb8c2e9 in do_xmalloc /path/to/git/wrapper.c:55
    #2 0x612e2bb8c3f7 in do_xmallocz /path/to/git/wrapper.c:89
    #3 0x612e2bb8c48f in xmallocz_gently /path/to/git/wrapper.c:102
    #4 0x612e2b8dc28e in unpack_compressed_entry /path/to/git/packfile.c:1744
    #5 0x612e2b8dd12a in unpack_entry /path/to/git/packfile.c:1897
    #6 0x612e2b8daae2 in cache_or_unpack_entry /path/to/git/packfile.c:1535
    #7 0x612e2b8db1f6 in packed_object_info_with_index_pos /path/to/git/packfile.c:1617
    #8 0x612e2b8dc1c4 in packed_object_info /path/to/git/packfile.c:1732
    #9 0x612e2b8def05 in packfile_store_read_object_info /path/to/git/packfile.c:2228
    #10 0x612e2b889cb0 in odb_source_files_read_object_info odb/source-files.c:58
    #11 0x612e2b8805e9 in odb_source_read_object_info odb/source.h:326
    #12 0x612e2b885cf0 in do_oid_object_info_extended /path/to/git/odb.c:572
    #13 0x612e2b886fe4 in odb_read_object_info_extended /path/to/git/odb.c:710
    #14 0x612e2b887584 in odb_read_object /path/to/git/odb.c:756
    #15 0x612e2b68d6e4 in repo_get_commit_buffer /path/to/git/commit.c:399
    #16 0x612e2b91a7d4 in repo_logmsg_reencode /path/to/git/pretty.c:716
    #17 0x612e2b3de7da in commit_tree_ext builtin/history.c:127
    #18 0x612e2b3dee9f in commit_tree_with_edited_message builtin/history.c:183
    #19 0x612e2b3e2c4d in cmd_history_reword builtin/history.c:717
    #20 0x612e2b3e53b6 in cmd_history builtin/history.c:998
    #21 0x612e2b27ae97 in run_builtin /path/to/git/git.c:506
    #22 0x612e2b27b9ae in handle_builtin /path/to/git/git.c:782
    #23 0x612e2b27c240 in run_argv /path/to/git/git.c:865
    #24 0x612e2b27cd94 in cmd_main /path/to/git/git.c:986
    #25 0x612e2b5c4267 in main /path/to/git/common-main.c:9
    #26 0x79a3d182a600 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:59
    #27 0x79a3d182a717 in __libc_start_main_impl ../csu/libc-start.c:360
    #28 0x612e2b276124 in _start (/path/to.local/bin/git+0x211124) (BuildId: 8da3d640a944e21b895fc4802d7942b1505be663)

SUMMARY: AddressSanitizer: 1813 byte(s) leaked in 1 allocation(s).

-- >8 --

A deeper investigation on this reveals the following as the root cause.

As part of rewording a commit in `git history`, we get the commit message
buffer in the `commit_tree_ext` function. This in turn obtains the buffer
from `repo_logmsg_reencode`. Given how `commit_tree_ext` is invoking the
function with the last two parameters as NULL, we are clearly not expecting
a reencode to happen. In this case, the buffer that we receive from
`repo_logmsg_reencode` ends up always being obtained from a call to
`repo_get_commit_buffer`.

This buffer is expected to be released with an accompanying call to
`repo_unuse_commit_buffer` which takes care of freeing it. This call
is missing in the `commit_tree_ext` flow thus resulting in the leak.

Fix this by ensuring we call `repo_unuse_commit_buffer` on the
original_message buffer.

Signed-off-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
---
I must mention that I also noticed the following comment in `commit_tree_ext`:

»       /* We retain authorship of the original commit. */
»       original_message = repo_logmsg_reencode(repo, commit_with_message, NULL, NULL);

... but I'm not quite sure why we don't unuse the buffer after its purpose is
done. Kindly englighten me in case I missed something.


 builtin/history.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/builtin/history.c b/builtin/history.c
index 091465a59e..0e9259b5d7 100644
--- a/builtin/history.c
+++ b/builtin/history.c
@@ -154,6 +154,7 @@ static int commit_tree_ext(struct repository *repo,
 	free_commit_extra_headers(original_extra_headers);
 	strbuf_release(&commit_message);
 	free(original_author);
+	repo_unuse_commit_buffer(repo, commit_with_message, original_message);
 	return ret;
 }
 
-- 
2.55.0.rc0.738.g0c8ab3ebcc.dirty


^ permalink raw reply related

* Re: [PATCH v2 5/5] cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts
From: Junio C Hamano @ 2026-06-14 15:22 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Patrick Steinhardt, Christian Couder, Elijah Newren
In-Reply-To: <cf50f1aabcf84aa755808318756c233305cc008d.1781419047.git.gitgitgadget@gmail.com>

"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/cache-tree.c b/cache-tree.c
> index 7881b42aa2..4d2669b312 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -161,6 +161,54 @@ void cache_tree_invalidate_path(struct index_state *istate, const char *path)
>  		istate->cache_changed |= CACHE_TREE_CHANGED;
>  }
>  
> +/*
> + * Check whether this_ce and the next entry in the index form a D/F
> + * conflict ("path" vs "path/file").  Returns the conflicting "path/..."
> + * name when one is found, or NULL otherwise.
> + *
> + * The cache is sorted, so "path/file" sorts after "path" and the
> + * conflict is usually visible as adjacent entries.  But other entries
> + * can sort between them -- e.g. "path-internal" sits between "path"
> + * and "path/file" because '-' (0x2D) precedes '/' (0x2F) -- so when
> + * the immediately following entry shares our prefix but starts with a
> + * character that sorts before '/', binary search for "path/" instead.
> + */
> +static const char *find_df_conflict(struct index_state *istate,
> +				    const struct cache_entry *this_ce,
> +				    const struct cache_entry *next_ce)
> +{
> +	const char *this_name = this_ce->name;
> +	const char *next_name = next_ce->name;
> +	int this_len = ce_namelen(this_ce);
> +	const struct cache_entry *other;
> +	struct strbuf probe = STRBUF_INIT;
> +	int pos;

> +	if (this_len >= ce_namelen(next_ce) ||
> +	    next_name[this_len] > '/' ||
> +	    strncmp(this_name, next_name, this_len))
> +		return NULL;

First we reject an abvious "cannot conflict" case. If the current
one is not strictly shorter than the next one, it cannot be an
overlapping with one of the leading directories in the next one and
the next one cannot be "hiding" an overlapping one in the "docs,
docs-i, docs/r" relationship described in the log message (where
next=="docs-i' hides the fact that this=="docs" conflicts with
"docs/r").  Of course, the current one cannot be such a confliciting
entry, if an earlier part of the next name does not entirely match
the current name.  In addition, if the byte after the matching
leading part in the next name is larger than '/', any entry that
comes later than the next name cannot have '/' at that byte position.

Makes sense.

> +	if (next_name[this_len] == '/')
> +		return next_name;

And if we see '/' there, we definitely have conflict right there.

So these trivial cases after us, we need to see if an entry that
comes after next has a name whose leading part is identical to this
name, plus '/'.  How would we compute it?

> +	strbuf_add(&probe, this_name, this_len);
> +	strbuf_addch(&probe, '/');
> +	pos = index_name_pos_sparse(istate, probe.buf, probe.len);
> +	strbuf_release(&probe);

We see where the first of such a name (i.e. this name followed by '/')
may appear.  Normal cache entries in the index would never have a
trailing slash in their names, so I expect that this would either
find an directory that is sparsed out and returns a non-negative
index into the .cache[] array, or a negative index that indicates
where a name like this + '/' + anything would appear.

> +
> +	if (pos < 0)
> +		pos = -pos - 1;

So in order to check both cases, we do the usual index flipping.  We
do not need to to treat "it exists---that's sparse dir" case and "it
is expanded and we are trying to find the first entry in that index"
case differently below, which simplifies things a bit.

> +	if (pos >= (int)istate->cache_nr)
> +		return NULL;

OK, such an entry whose name is this followed by '/' does not exist
so we are safe.

> +	other = istate->cache[pos];
> +	if (ce_namelen(other) > this_len &&
> +	    other->name[this_len] == '/' &&
> +	    !strncmp(this_name, other->name, this_len))
> +		return other->name;
> +	return NULL;

This looks very similar to the inverse of the earlier one but
slightly different.  It might be easier to spot where it differs, if
we flipped the polarity, like so, perhaps?

	other = istate->cache[pos];
	if (this_len >= ce_namelen(other) ||
	    other->name[this_len] != '/' ||
	    strncmp(this_name, other->name, this_len))
		return NULL;

	return other->name;

Or perhaps not.  I do not care very strongly, but I only spotted the
subtle difference while attempting to flip the polarity in my head,
so it might help ther readers the same way.  I dunno.

Thanks, looking very good.


^ permalink raw reply

* [PATCH] cat-file: speed up default format
From: René Scharfe @ 2026-06-14 16:28 UTC (permalink / raw)
  To: Git List

eb54a3391b (cat-file: skip expanding default format, 2022-03-15) added
special handling for the default batch format.  In the meantime it has
fallen behind the code path for handling arbitrary formats.  Bring it up
to speed by using the new and more efficient strbuf_add_oid_hex() and
strbuf_add_uint() instead of strbuf_addf():

Benchmark 1: ./git_main cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype) %(objectsize)'
  Time (mean ± σ):      1.051 s ±  0.003 s    [User: 1.027 s, System: 0.023 s]
  Range (min … max):    1.049 s …  1.058 s    10 runs

Benchmark 2: ./git_main cat-file --batch-all-objects --batch-check='%(objectname)-%(objecttype)-%(objectsize)'
  Time (mean ± σ):      1.012 s ±  0.002 s    [User: 0.988 s, System: 0.023 s]
  Range (min … max):    1.010 s …  1.018 s    10 runs

Benchmark 3: ./git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype) %(objectsize)'
  Time (mean ± σ):     979.0 ms ±   1.1 ms    [User: 954.1 ms, System: 23.2 ms]
  Range (min … max):   977.7 ms … 980.8 ms    10 runs

Summary
  ./git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype) %(objectsize)' ran
    1.03 ± 0.00 times faster than ./git_main cat-file --batch-all-objects --batch-check='%(objectname)-%(objecttype)-%(objectsize)'
    1.07 ± 0.00 times faster than ./git_main cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype) %(objectsize)'

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 builtin/cat-file.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 2b64f8f733..d7f7895e30 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -461,9 +461,12 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 static void print_default_format(struct strbuf *scratch, struct expand_data *data,
 				 struct batch_options *opt)
 {
-	strbuf_addf(scratch, "%s %s %"PRIuMAX"%c", oid_to_hex(&data->oid),
-		    type_name(data->type),
-		    (uintmax_t)data->size, opt->output_delim);
+	strbuf_add_oid_hex(scratch, &data->oid);
+	strbuf_addch(scratch, ' ');
+	strbuf_addstr(scratch, type_name(data->type));
+	strbuf_addch(scratch, ' ');
+	strbuf_add_uint(scratch, data->size);
+	strbuf_addch(scratch, opt->output_delim);
 }
 
 static void report_object_status(struct batch_options *opt,
-- 
2.54.0

^ permalink raw reply related

* [PATCH] doc: fix a small, old release notes typo
From: D. Ben Knoble @ 2026-06-14 17:28 UTC (permalink / raw)
  To: git; +Cc: D. Ben Knoble, Elijah Newren

Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com>
---
No harm done if you choose not to keep this, I think. Stumbled upon it when
trying to understand Elijah's message [1] about timestamp_t overflowing in 2106
(I though 32-bit time_t overflowed in 2038, but timestamp_t is something
different… except maybe when it's not? Anyway…)

[1]: <pull.2148.git.1781420271100.gitgitgadget@gmail.com>

 Documentation/RelNotes/2.14.0.adoc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/RelNotes/2.14.0.adoc b/Documentation/RelNotes/2.14.0.adoc
index 2711a2529d..182fbe6179 100644
--- a/Documentation/RelNotes/2.14.0.adoc
+++ b/Documentation/RelNotes/2.14.0.adoc
@@ -142,7 +142,7 @@ Performance, Internal Implementation, Development Support etc.
    historical use of ulong for timestamp would mean they cannot
    represent some timestamp that the platform allows.  Invent a
    separate and dedicated timestamp_t (so that we can distinguish
-   timestamps and a vanilla ulongs, which along is already a good
+   timestamps and a vanilla ulongs, which alone is already a good
    move), and then declare uintmax_t is the type to be used as the
    timestamp_t.
 

base-commit: 0c8ab3ebcc76981376809c8fe632d0fe18e93347
-- 
2.54.0.1136.gdb2ca164c4.dirty


^ permalink raw reply related

* [PATCH 0/2] Silence po catalog output under "make -s"
From: Harald Nordgren via GitGitGadget @ 2026-06-14 17:52 UTC (permalink / raw)
  To: git; +Cc: Harald Nordgren

The gitk and git-gui catalog rules sent msgfmt --statistics output (and a
"Generating catalog" line) to stderr, so it survived "make -s". Emit it only
when "-s" is absent, keeping a quiet build silent and a verbose build
unchanged.

Harald Nordgren (2):
  gitk: silence catalog output under "make -s"
  git-gui: silence statistics under "make -s"

 git-gui/Makefile  |  3 ++-
 gitk-git/Makefile | 10 ++++++++--
 2 files changed, 10 insertions(+), 3 deletions(-)


base-commit: ea97ad8d017de0c9037451a78008a0fd60abea0c
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-2339%2FHaraldNordgren%2Fsilence-catalog-output-under-make-s-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-2339/HaraldNordgren/silence-catalog-output-under-make-s-v1
Pull-Request: https://github.com/git/git/pull/2339
-- 
gitgitgadget

^ permalink raw reply

* [PATCH 1/2] gitk: silence catalog output under "make -s"
From: Harald Nordgren via GitGitGadget @ 2026-06-14 17:52 UTC (permalink / raw)
  To: git; +Cc: Harald Nordgren, Harald Nordgren
In-Reply-To: <pull.2339.git.git.1781459539.gitgitgadget@gmail.com>

From: Harald Nordgren <haraldnordgren@gmail.com>

The catalog rule unconditionally echoed "Generating catalog" and ran
msgfmt with --statistics, whose output went to stderr and so survived
"make -s".

Emit the "Generating catalog" line and the msgfmt statistics only when
"-s" is absent, leaving a quiet build silent while default and V=1
builds are unchanged.

Signed-off-by: Harald Nordgren <haraldnordgren@gmail.com>
---
 gitk-git/Makefile | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/gitk-git/Makefile b/gitk-git/Makefile
index 41116d8a14..10078c54d4 100644
--- a/gitk-git/Makefile
+++ b/gitk-git/Makefile
@@ -43,10 +43,16 @@ PO_TEMPLATE = po/gitk.pot
 ALL_POFILES = $(wildcard po/*.po)
 ALL_MSGFILES = $(subst .po,.msg,$(ALL_POFILES))
 
+MSGFMT_GEN = @:
+
+ifneq ($(findstring s,$(firstword -$(MAKEFLAGS))),s)
 ifndef V
 	QUIET          = @
 	QUIET_GEN      = $(QUIET)echo '   ' GEN $@ &&
 endif
+	MSGFMT_GEN   = @echo Generating catalog $@
+	MSGFMT_STATS = --statistics
+endif
 
 all:: gitk-wish $(ALL_MSGFILES)
 
@@ -75,8 +81,8 @@ update-po:: $(PO_TEMPLATE)
 	echo; \
 	echo "	git config filter.gettext-no-location.clean \"msgcat --no-location -\""
 $(ALL_MSGFILES): %.msg : %.po
-	@echo Generating catalog $@
-	$(MSGFMT) --statistics --tcl -l $(basename $(notdir $<)) -d $(dir $@) $<
+	$(MSGFMT_GEN)
+	$(MSGFMT) $(MSGFMT_STATS) --tcl -l $(basename $(notdir $<)) -d $(dir $@) $<
 
 .PHONY: all install uninstall clean update-po
 .PHONY: FORCE
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH 2/2] git-gui: silence statistics under "make -s"
From: Harald Nordgren via GitGitGadget @ 2026-06-14 17:52 UTC (permalink / raw)
  To: git; +Cc: Harald Nordgren, Harald Nordgren
In-Reply-To: <pull.2339.git.git.1781459539.gitgitgadget@gmail.com>

From: Harald Nordgren <haraldnordgren@gmail.com>

The catalog rule passed --statistics to msgfmt unconditionally, and its
output went to stderr, so it survived "make -s".

Pass --statistics only when "-s" is absent, leaving a quiet build silent
while default and V=1 builds are unchanged.

Signed-off-by: Harald Nordgren <haraldnordgren@gmail.com>
---
 git-gui/Makefile | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/git-gui/Makefile b/git-gui/Makefile
index d33204e875..76245fa84b 100644
--- a/git-gui/Makefile
+++ b/git-gui/Makefile
@@ -90,6 +90,7 @@ ifndef V
 	REMOVE_F0 = dst=
 	REMOVE_F1 = && echo '   ' REMOVE `basename "$$dst"` && $(RM_RF) "$$dst"
 endif
+	MSGFMT_STATS = --statistics
 endif
 
 TCLTK_PATH ?= wish
@@ -155,7 +156,7 @@ $(PO_TEMPLATE): $(SCRIPT_SH) $(ALL_LIBFILES)
 update-po:: $(PO_TEMPLATE)
 	$(foreach p, $(ALL_POFILES), echo Updating $p ; msgmerge -U $p $(PO_TEMPLATE) ; )
 $(ALL_MSGFILES): %.msg : %.po
-	$(QUIET_MSGFMT0)$(MSGFMT) --statistics --tcl -l $(basename $(notdir $<)) -d $(dir $@) $< $(QUIET_MSGFMT1)
+	$(QUIET_MSGFMT0)$(MSGFMT) $(MSGFMT_STATS) --tcl -l $(basename $(notdir $<)) -d $(dir $@) $< $(QUIET_MSGFMT1)
 
 lib/tclIndex: $(ALL_LIBFILES) generate-tclindex.sh GIT-GUI-BUILD-OPTIONS
 	$(QUIET_INDEX)$(SHELL_PATH) generate-tclindex.sh . ./GIT-GUI-BUILD-OPTIONS $(ALL_LIBFILES)
-- 
gitgitgadget

^ permalink raw reply related

* [PATCH v4 0/6] [RFC] diff: add diff.<driver>.process for external hunk providers
From: Michael Montalbo via GitGitGadget @ 2026-06-14 18:59 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Michael Montalbo
In-Reply-To: <pull.2120.v3.git.1780087700.gitgitgadget@gmail.com>

Language-aware diff tools (e.g., Difftastic) and format-specific analyzers
can produce better line matching than Git's builtin diff algorithm, but
diff.<driver>.command replaces Git's output entirely, losing downstream
features like word diff, function context, color, and blame.

This series adds diff.<driver>.process, a long-running subprocess protocol
that lets an external tool control which lines Git considers changed while
Git handles all output formatting. The protocol follows
filter.<driver>.process: pkt-line over stdin/stdout, capability negotiation,
one process per Git invocation.

The tool receives both file versions and returns changed regions (line
ranges in the old and new file). Git validates and feeds them into the xdiff
pipeline in place of the builtin diff algorithm. When the tool returns no
hunks, Git treats the files as having no changes.

 * Patch 1: xdiff plumbing for externally supplied hunks.
 * Patch 2: diff.<driver>.process config key.
 * Patch 3: refactor subprocess API to separate process lifecycle from
   hashmap management, since the diff process stores its subprocess on the
   userdiff driver rather than in a hashmap.
 * Patch 4: the main feature.
 * Patch 5: bypass knobs (--no-ext-diff, format-patch).
 * Patch 6: blame integration so the tool can declare commits as having no
   changes.

Changes since v3:

 * Replaced Python test backend with C test-tool helper (thanks to Johannes
   Schindelin).
 * Added test coverage cases for deleted file, malformed hunk line, and
   missing capability.
 * Fixed potential overflow in synchronization invariant check by counting
   from changed[] arrays instead of accumulating.
 * Accept start=0 with count=0 in the hunk protocol, matching what git diff
   itself emits for empty file sides.
 * Warn on external hunk validation failure with specific reasons (range
   exceeded, overlap, sync mismatch) to help tool authors debug their
   implementations.
 * Test backend follows the same convention (start=0 when count=0 for empty
   file sides).

Michael Montalbo (6):
  xdiff: support external hunks via xpparam_t
  userdiff: add diff.<driver>.process config
  sub-process: separate process lifecycle from hashmap management
  diff: add long-running diff process via diff.<driver>.process
  diff: bypass diff process with --no-ext-diff and in format-patch
  blame: consult diff process for no-hunk detection

 Documentation/config/diff.adoc           |   5 +
 Documentation/diff-algorithm-option.adoc |   3 +
 Documentation/diff-options.adoc          |   4 +-
 Documentation/gitattributes.adoc         | 143 ++++++
 Makefile                                 |   2 +
 blame.c                                  |  40 +-
 builtin/log.c                            |   7 +
 diff-process.c                           | 297 ++++++++++++
 diff-process.h                           |  39 ++
 diff.c                                   |  29 +-
 diff.h                                   |   5 +
 meson.build                              |   1 +
 sub-process.c                            |  28 +-
 sub-process.h                            |   9 +-
 t/helper/meson.build                     |   1 +
 t/helper/test-diff-process-backend.c     | 299 ++++++++++++
 t/helper/test-tool.c                     |   1 +
 t/helper/test-tool.h                     |   1 +
 t/meson.build                            |   1 +
 t/t4080-diff-process.sh                  | 553 +++++++++++++++++++++++
 userdiff.c                               |   7 +
 userdiff.h                               |   5 +
 xdiff-interface.c                        |   7 +-
 xdiff/xdiff.h                            |  14 +
 xdiff/xdiffi.c                           | 123 ++++-
 xdiff/xprepare.c                         |  10 +
 xdiff/xprepare.h                         |   1 +
 27 files changed, 1614 insertions(+), 21 deletions(-)
 create mode 100644 diff-process.c
 create mode 100644 diff-process.h
 create mode 100644 t/helper/test-diff-process-backend.c
 create mode 100755 t/t4080-diff-process.sh


base-commit: ea97ad8d017de0c9037451a78008a0fd60abea0c
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2120%2Fmmontalbo%2Fmm%2Fstructural-diff-backend-clean-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2120/mmontalbo/mm/structural-diff-backend-clean-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/2120

Range-diff vs v3:

 1:  13eb201d63 ! 1:  03f261dfe2 xdiff: support external hunks via xpparam_t
     @@ xdiff/xdiff.h: typedef struct s_mmbuffer {
       
      +/*
      + * Hunk descriptor for externally computed diffs.
     -+ * Line numbers are 1-based, matching unified diff convention.
     ++ * Line numbers are 1-based; a start of 0 is accepted when
     ++ * count is 0 (empty file side, matching git diff output).
      + */
      +struct xdl_hunk {
      +	long old_start, old_count;
     @@ xdiff/xdiffi.c: static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdf
      +{
      +	size_t i;
      +	long j, prev_old_end = 0, prev_new_end = 0;
     -+	long total_old = 0, total_new = 0;
     ++	long changed_old = 0, changed_new = 0;
      +
      +	/*
      +	 * xdl_prepare_env() may dirty changed[] via xdl_cleanup_records().
     @@ xdiff/xdiffi.c: static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdf
      +	for (i = 0; i < nr_hunks; i++) {
      +		struct xdl_hunk *h = &hunks[i];
      +
     -+		if (h->old_count < 0 || h->new_count < 0)
     ++		if (h->old_count < 0 || h->new_count < 0) {
     ++			warning("diff process hunk %"PRIuMAX": "
     ++				"negative count (old=%ld, new=%ld)",
     ++				(uintmax_t)(i + 1),
     ++				h->old_count, h->new_count);
      +			return -1;
     -+		if (h->old_start < 1 || h->new_start < 1)
     ++		}
     ++		if (h->old_start < 1 || h->new_start < 1) {
     ++			warning("diff process hunk %"PRIuMAX": "
     ++				"start must be >= 1 (old=%ld, new=%ld)",
     ++				(uintmax_t)(i + 1),
     ++				h->old_start, h->new_start);
      +			return -1;
     ++		}
      +
      +		/*
      +		 * Range must fit: start + count - 1 <= nrec,
     @@ xdiff/xdiffi.c: static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdf
      +		 * start > nrec + 1 and allows start == nrec + 1
      +		 * (the position after the last line).
      +		 */
     -+		if (h->old_count > (long)xe->xdf1.nrec - h->old_start + 1)
     ++		if (h->old_count > (long)xe->xdf1.nrec - h->old_start + 1) {
     ++			warning("diff process hunk %"PRIuMAX": "
     ++				"old range %ld+%ld exceeds %lu lines",
     ++				(uintmax_t)(i + 1),
     ++				h->old_start, h->old_count,
     ++				(unsigned long)xe->xdf1.nrec);
      +			return -1;
     -+		if (h->new_count > (long)xe->xdf2.nrec - h->new_start + 1)
     ++		}
     ++		if (h->new_count > (long)xe->xdf2.nrec - h->new_start + 1) {
     ++			warning("diff process hunk %"PRIuMAX": "
     ++				"new range %ld+%ld exceeds %lu lines",
     ++				(uintmax_t)(i + 1),
     ++				h->new_start, h->new_count,
     ++				(unsigned long)xe->xdf2.nrec);
      +			return -1;
     ++		}
      +
      +		/* Ordering: no overlap with previous hunk (adjacent is OK) */
      +		if (h->old_start < prev_old_end ||
     -+		    h->new_start < prev_new_end)
     ++		    h->new_start < prev_new_end) {
     ++			warning("diff process hunk %"PRIuMAX": "
     ++				"overlaps with previous hunk",
     ++				(uintmax_t)(i + 1));
      +			return -1;
     ++		}
      +
      +		for (j = 0; j < h->old_count; j++)
      +			xe->xdf1.changed[h->old_start - 1 + j] = true;
     @@ xdiff/xdiffi.c: static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdf
      +
      +		prev_old_end = h->old_start + h->old_count;
      +		prev_new_end = h->new_start + h->new_count;
     -+		total_old += h->old_count;
     -+		total_new += h->new_count;
      +	}
      +
      +	/*
      +	 * Synchronization invariant: unchanged line counts must match.
      +	 * Otherwise xdl_build_script() would walk off one array.
     ++	 *
     ++	 * Count changed lines from the arrays rather than accumulating
     ++	 * during the loop to avoid any overflow in the summation.
      +	 */
     -+	if ((long)xe->xdf1.nrec - total_old !=
     -+	    (long)xe->xdf2.nrec - total_new)
     ++	for (j = 0; j < (long)xe->xdf1.nrec; j++)
     ++		if (xe->xdf1.changed[j])
     ++			changed_old++;
     ++	for (j = 0; j < (long)xe->xdf2.nrec; j++)
     ++		if (xe->xdf2.changed[j])
     ++			changed_new++;
     ++	if ((long)xe->xdf1.nrec - changed_old !=
     ++	    (long)xe->xdf2.nrec - changed_new) {
     ++		warning("diff process: unchanged line count mismatch "
     ++			"(old: %ld unchanged, new: %ld unchanged)",
     ++			(long)xe->xdf1.nrec - changed_old,
     ++			(long)xe->xdf2.nrec - changed_new);
      +		return -1;
     ++	}
      +
      +	return 0;
      +}
 2:  58f4763c63 = 2:  30617ee17b userdiff: add diff.<driver>.process config
 3:  d6c833dd42 ! 3:  459e485e6d sub-process: separate process lifecycle from hashmap management
     @@ Commit message
          and subprocess_stop() become thin wrappers that add hashmap
          operations on top.
      
     -    No functional change for existing callers.
     -
          Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
      
       ## sub-process.c ##
     @@ sub-process.c: void subprocess_stop(struct hashmap *hashmap, struct subprocess_e
       	kill(entry->process.pid, SIGTERM);
       	finish_command(&entry->process);
      +}
     -+
     + 
      +void subprocess_stop(struct hashmap *hashmap, struct subprocess_entry *entry)
      +{
      +	if (!entry)
      +		return;
     - 
     ++
      +	subprocess_stop_command(entry);
       	hashmap_remove(hashmap, &entry->ent, NULL);
       }
     @@ sub-process.c: int subprocess_start(struct hashmap *hashmap, struct subprocess_e
      +	int err;
      +
      +	err = subprocess_start_command(entry, cmd, startfn);
     -+	if (err) {
     ++	if (err)
      +		return err;
     -+	}
      +
      +	hashmap_entry_init(&entry->ent, strhash(cmd));
       	hashmap_add(hashmap, &entry->ent);
 4:  d044fa0ee5 ! 4:  10b3980f59 diff: add long-running diff process via diff.<driver>.process
     @@ Commit message
          textconv-transformed content.  The tool controls which lines
          are marked as changed while the display shows the file content.
          Patch output features (word diff, function context, color) work
     -    normally; summary formats like --stat use their own diff path
     -    and are not affected.
     +    normally; --stat uses its own diff codepath and never consults
     +    the diff process.
      
          The handshake negotiates version=1 and capability=hunks.  Per-file
          requests send command=hunks, pathname, and both file contents as
     @@ Commit message
          "hunks populated" from "files equivalent" from "not applicable"
          from "tool failure."
      
     +    Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
          Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
      
       ## Documentation/config/diff.adoc ##
     @@ Documentation/gitattributes.adoc: NOTE: If `diff.<name>.command` is defined for
      +at line 3 in the old file were replaced by 4 lines starting at
      +line 3 in the new file.  An `<old_count>` of 0 means no lines were
      +removed (pure insertion); a `<new_count>` of 0 means no lines were
     -+added (pure deletion).
     ++added (pure deletion).  A start value of 0 is accepted when
     ++the corresponding count is 0 (e.g., `hunk 0 0 1 5` for a newly
     ++added file), matching what `git diff` itself emits for empty
     ++file sides.
      +
      +Lines are delimited by newlines.  A file `"foo\nbar\n"` and a
      +file `"foo\nbar"` both have 2 lines.
     @@ Documentation/gitattributes.adoc: NOTE: If `diff.<name>.command` is defined for
      +packet:          git< 0000
      +-----------------------
      +
     -+If the tool returns invalid hunks (out of bounds, overlapping), Git
     -+silently falls back to the builtin diff algorithm.
     ++If the tool returns invalid hunks (out of bounds, overlapping, or
     ++mismatched unchanged line counts), Git warns and falls back to the
     ++builtin diff algorithm.
      +
      +In case the tool cannot or does not want to process the content,
      +it is expected to respond with an "error" status.  Git warns and
     @@ Documentation/gitattributes.adoc: NOTE: If `diff.<name>.command` is defined for
       
      
       ## Makefile ##
     +@@ Makefile: TEST_BUILTINS_OBJS += test-csprng.o
     + TEST_BUILTINS_OBJS += test-date.o
     + TEST_BUILTINS_OBJS += test-delete-gpgsig.o
     + TEST_BUILTINS_OBJS += test-delta.o
     ++TEST_BUILTINS_OBJS += test-diff-process-backend.o
     + TEST_BUILTINS_OBJS += test-dir-iterator.o
     + TEST_BUILTINS_OBJS += test-drop-caches.o
     + TEST_BUILTINS_OBJS += test-dump-cache-tree.o
      @@ Makefile: LIB_OBJS += diff-delta.o
       LIB_OBJS += diff-merges.o
       LIB_OBJS += diff-lib.o
     @@ diff-process.c (new)
      +	if (errno || end == line || *end != '\0')
      +		return -1;
      +
     ++	/*
     ++	 * git diff emits start=0 when count=0 (empty file side).
     ++	 * Normalize to 1-based so downstream validation can assume start >= 1.
     ++	 */
     ++	if (!hunk->old_count && !hunk->old_start)
     ++		hunk->old_start = 1;
     ++	if (!hunk->new_count && !hunk->new_start)
     ++		hunk->new_start = 1;
     ++
      +	return 0;
      +}
      +
     @@ meson.build: libgit_sources = [
         'diffcore-break.c',
         'diffcore-delta.c',
      
     - ## t/.gitattributes ##
     -@@ t/.gitattributes: t[0-9][0-9][0-9][0-9]/* -whitespace
     - /t8005/*.txt eol=lf
     - /t9*/*.dump eol=lf
     - /t0040*.sh whitespace=-indent-with-non-tab
     -+/t4080-diff-process.sh whitespace=-indent-with-non-tab
     + ## t/helper/meson.build ##
     +@@ t/helper/meson.build: test_tool_sources = [
     +   'test-date.c',
     +   'test-delete-gpgsig.c',
     +   'test-delta.c',
     ++  'test-diff-process-backend.c',
     +   'test-dir-iterator.c',
     +   'test-drop-caches.c',
     +   'test-dump-cache-tree.c',
     +
     + ## t/helper/test-diff-process-backend.c (new) ##
     +@@
     ++/*
     ++ * Test backend for the long-running diff process protocol
     ++ * (see diff-process.c and Documentation/gitattributes.adoc).
     ++ *
     ++ * Usage: test-tool diff-process-backend --mode=<mode> [--log=<path>]
     ++ *
     ++ * Implements the server side of the pkt-line handshake and a per-file
     ++ * response loop.  The --mode= switch selects the response shape
     ++ * (success, error, abort, crash, malformed hunks).
     ++ *
     ++ * Per-file request from Git:
     ++ *
     ++ *   packet:          git> command=hunks
     ++ *   packet:          git> pathname=<path>
     ++ *   packet:          git> 0000
     ++ *   packet:          git> OLD_CONTENT
     ++ *   packet:          git> 0000
     ++ *   packet:          git> NEW_CONTENT
     ++ *   packet:          git> 0000
     ++ *
     ++ * Response varies by --mode (default: whole-file):
     ++ *
     ++ *   whole-file   packet: git< hunk 1 <old_lines> 1 <new_lines>
     ++ *   fixed-hunk   packet: git< hunk 5 2 5 2
     ++ *   no-hunks     (no hunk packets)
     ++ *   bad-hunk     packet: git< hunk 999 1 999 1
     ++ *   bad-parse    packet: git< garbage not a hunk
     ++ *   bad-sync     packet: git< hunk 1 2 1 1
     ++ *   overlap      packet: git< hunk 1 5 1 5
     ++ *                packet: git< hunk 3 2 3 2
     ++ *   no-cap       (omits capability=hunks during handshake)
     ++ *   error        (status=error instead of status=success)
     ++ *   abort        (status=abort instead of status=success)
     ++ *   crash        exit(1) before sending any response
     ++ *
     ++ * All non-error/abort modes end with:
     ++ *
     ++ *   packet:          git< 0000
     ++ *   packet:          git< status=success
     ++ *   packet:          git< 0000
     ++ *
     ++ * Each request is logged to --log as:
     ++ *
     ++ *   command=<cmd> pathname=<path> old=<first line> new=<first line>
     ++ */
     ++
     ++#include "test-tool.h"
     ++#include "pkt-line.h"
     ++#include "parse-options.h"
     ++#include "strbuf.h"
     ++
     ++static FILE *logfile;
     ++
     ++enum mode {
     ++	MODE_WHOLE_FILE,
     ++	MODE_FIXED_HUNK,
     ++	MODE_NO_HUNKS,
     ++	MODE_BAD_HUNK,
     ++	MODE_BAD_PARSE,
     ++	MODE_BAD_SYNC,
     ++	MODE_OVERLAP,
     ++	MODE_NO_CAP,
     ++	MODE_ERROR,
     ++	MODE_ABORT,
     ++	MODE_CRASH,
     ++};
     ++
     ++static enum mode parse_mode(const char *s)
     ++{
     ++	if (!strcmp(s, "whole-file"))
     ++		return MODE_WHOLE_FILE;
     ++	if (!strcmp(s, "fixed-hunk"))
     ++		return MODE_FIXED_HUNK;
     ++	if (!strcmp(s, "no-hunks"))
     ++		return MODE_NO_HUNKS;
     ++	if (!strcmp(s, "bad-hunk"))
     ++		return MODE_BAD_HUNK;
     ++	if (!strcmp(s, "bad-parse"))
     ++		return MODE_BAD_PARSE;
     ++	if (!strcmp(s, "bad-sync"))
     ++		return MODE_BAD_SYNC;
     ++	if (!strcmp(s, "overlap"))
     ++		return MODE_OVERLAP;
     ++	if (!strcmp(s, "no-cap"))
     ++		return MODE_NO_CAP;
     ++	if (!strcmp(s, "error"))
     ++		return MODE_ERROR;
     ++	if (!strcmp(s, "abort"))
     ++		return MODE_ABORT;
     ++	if (!strcmp(s, "crash"))
     ++		return MODE_CRASH;
     ++	die("unknown --mode=%s", s);
     ++}
     ++
     ++/*
     ++ * Read "key=value" packets up to a flush, capturing "command" and
     ++ * "pathname".  Returns 1 if a request was read, 0 on EOF.
     ++ *
     ++ * The first packet uses the gentle variant so that a clean shutdown
     ++ * by Git (EOF) does not produce a spurious "the remote end hung up
     ++ * unexpectedly" on stderr.  Subsequent packets use the non-gentle
     ++ * variant: once inside a request, truncation is a protocol violation
     ++ * and dying loudly is the correct response.
     ++ */
     ++static int read_request_header(char **command, char **pathname)
     ++{
     ++	int first = 1;
     ++	char *line;
     ++
     ++	*command = *pathname = NULL;
     ++	for (;;) {
     ++		const char *value;
     ++
     ++		if (first) {
     ++			if (packet_read_line_gently(0, NULL, &line) < 0)
     ++				return 0;
     ++			first = 0;
     ++		} else {
     ++			line = packet_read_line(0, NULL);
     ++		}
     ++		if (!line)
     ++			break;
     ++		if (skip_prefix(line, "command=", &value))
     ++			*command = xstrdup(value);
     ++		else if (skip_prefix(line, "pathname=", &value))
     ++			*pathname = xstrdup(value);
     ++	}
     ++	return 1;
     ++}
     ++
     ++static size_t count_lines(const struct strbuf *buf)
     ++{
     ++	size_t lines = 0;
     ++
     ++	for (size_t i = 0; i < buf->len; i++)
     ++		if (buf->buf[i] == '\n')
     ++			lines++;
     ++
     ++	return lines + (buf->len > 0 && buf->buf[buf->len - 1] != '\n');
     ++}
     ++
     ++static void send_status(const char *status)
     ++{
     ++	packet_flush(1);
     ++	packet_write_fmt(1, "%s\n", status);
     ++	packet_flush(1);
     ++}
     ++
     ++static void respond(enum mode mode,
     ++		    const struct strbuf *old_buf,
     ++		    const struct strbuf *new_buf)
     ++{
     ++	switch (mode) {
     ++	case MODE_ERROR:
     ++		send_status("status=error");
     ++		return;
     ++	case MODE_ABORT:
     ++		send_status("status=abort");
     ++		return;
     ++	case MODE_CRASH:
     ++		exit(1);
     ++	case MODE_FIXED_HUNK:
     ++		packet_write_fmt(1, "hunk 5 2 5 2\n");
     ++		break;
     ++	case MODE_BAD_HUNK:
     ++		packet_write_fmt(1, "hunk 999 1 999 1\n");
     ++		break;
     ++	case MODE_BAD_PARSE:
     ++		packet_write_fmt(1, "garbage not a hunk\n");
     ++		break;
     ++	case MODE_BAD_SYNC:
     ++		packet_write_fmt(1, "hunk 1 2 1 1\n");
     ++		break;
     ++	case MODE_OVERLAP:
     ++		packet_write_fmt(1, "hunk 1 5 1 5\n");
     ++		packet_write_fmt(1, "hunk 3 2 3 2\n");
     ++		break;
     ++	case MODE_NO_HUNKS:
     ++		break;
     ++	case MODE_NO_CAP:
     ++	case MODE_WHOLE_FILE: {
     ++		size_t old_lines = count_lines(old_buf);
     ++		size_t new_lines = count_lines(new_buf);
     ++		/*
     ++		 * Match git diff output: start=0 when count=0
     ++		 * (empty file side), 1 otherwise.
     ++		 */
     ++		packet_write_fmt(1, "hunk %"PRIuMAX" %"PRIuMAX
     ++				 " %"PRIuMAX" %"PRIuMAX"\n",
     ++				 (uintmax_t)(old_lines ? 1 : 0),
     ++				 (uintmax_t)old_lines,
     ++				 (uintmax_t)(new_lines ? 1 : 0),
     ++				 (uintmax_t)new_lines);
     ++		break;
     ++	}
     ++	}
     ++	send_status("status=success");
     ++}
     ++
     ++static void command_loop(enum mode mode)
     ++{
     ++	for (;;) {
     ++		char *command = NULL, *pathname = NULL;
     ++		struct strbuf obuf = STRBUF_INIT;
     ++		struct strbuf nbuf = STRBUF_INIT;
     ++
     ++		if (!read_request_header(&command, &pathname))
     ++			break; /* EOF: Git closed its end */
     ++
     ++		read_packetized_to_strbuf(0, &obuf, 0);
     ++		read_packetized_to_strbuf(0, &nbuf, 0);
     ++
     ++		if (logfile) {
     ++			fprintf(logfile,
     ++				"command=%s pathname=%s old=%.*s new=%.*s\n",
     ++				command ? command : "(none)",
     ++				pathname ? pathname : "(none)",
     ++				(int)(strchrnul(obuf.buf, '\n') - obuf.buf),
     ++				obuf.buf,
     ++				(int)(strchrnul(nbuf.buf, '\n') - nbuf.buf),
     ++				nbuf.buf);
     ++			fflush(logfile);
     ++		}
     ++
     ++		respond(mode, &obuf, &nbuf);
     ++
     ++		free(command);
     ++		free(pathname);
     ++		strbuf_release(&obuf);
     ++		strbuf_release(&nbuf);
     ++	}
     ++}
     ++
     ++static void handshake(enum mode mode)
     ++{
     ++	char *line;
     ++
     ++	line = packet_read_line(0, NULL);
     ++	if (!line || strcmp(line, "git-diff-client"))
     ++		die("bad welcome: '%s'", line ? line : "(eof)");
     ++	line = packet_read_line(0, NULL);
     ++	if (!line || strcmp(line, "version=1"))
     ++		die("bad version: '%s'", line ? line : "(eof)");
     ++	if (packet_read_line(0, NULL))
     ++		die("expected flush after version");
     ++
     ++	packet_write_fmt(1, "git-diff-server\n");
     ++	packet_write_fmt(1, "version=1\n");
     ++	packet_flush(1);
     ++
     ++	/* Drain capabilities advertised by Git */
     ++	while ((line = packet_read_line(0, NULL)))
     ++		; /* drain */
     ++
     ++	/* Respond with our capabilities (or none for no-cap mode) */
     ++	if (mode != MODE_NO_CAP)
     ++		packet_write_fmt(1, "capability=hunks\n");
     ++	packet_flush(1);
     ++}
     ++
     ++static const char *const usage_str[] = {
     ++	"test-tool diff-process-backend --mode=<mode> [--log=<path>]",
     ++	NULL
     ++};
     ++
     ++int cmd__diff_process_backend(int argc, const char **argv)
     ++{
     ++	const char *mode_str = NULL, *log_path = NULL;
     ++	enum mode mode = MODE_WHOLE_FILE;
     ++	struct option options[] = {
     ++		OPT_STRING(0, "mode", &mode_str, "mode",
     ++			   "response shape: whole-file (default), fixed-hunk,"
     ++			   " no-hunks, bad-hunk, bad-sync, overlap, error,"
     ++			   " abort, crash"),
     ++		OPT_STRING(0, "log", &log_path, "path",
     ++			   "append per-request summary to this file"),
     ++		OPT_END()
     ++	};
     ++
     ++	argc = parse_options(argc, argv, NULL, options, usage_str, 0);
     ++	if (argc)
     ++		usage_with_options(usage_str, options);
     ++
     ++	if (mode_str)
     ++		mode = parse_mode(mode_str);
     ++
     ++	if (log_path) {
     ++		logfile = fopen(log_path, "a");
     ++		if (!logfile)
     ++			die_errno("failed to open log '%s'", log_path);
     ++	}
     ++
     ++	handshake(mode);
     ++	command_loop(mode);
     ++
     ++	if (logfile && fclose(logfile))
     ++		die_errno("error closing log");
     ++	return 0;
     ++}
     +
     + ## t/helper/test-tool.c ##
     +@@ t/helper/test-tool.c: static struct test_cmd cmds[] = {
     + 	{ "date", cmd__date },
     + 	{ "delete-gpgsig", cmd__delete_gpgsig },
     + 	{ "delta", cmd__delta },
     ++	{ "diff-process-backend", cmd__diff_process_backend },
     + 	{ "dir-iterator", cmd__dir_iterator },
     + 	{ "drop-caches", cmd__drop_caches },
     + 	{ "dump-cache-tree", cmd__dump_cache_tree },
     +
     + ## t/helper/test-tool.h ##
     +@@ t/helper/test-tool.h: int cmd__csprng(int argc, const char **argv);
     + int cmd__date(int argc, const char **argv);
     + int cmd__delta(int argc, const char **argv);
     + int cmd__delete_gpgsig(int argc, const char **argv);
     ++int cmd__diff_process_backend(int argc, const char **argv);
     + int cmd__dir_iterator(int argc, const char **argv);
     + int cmd__drop_caches(int argc, const char **argv);
     + int cmd__dump_cache_tree(int argc, const char **argv);
      
       ## t/meson.build ##
      @@ t/meson.build: integration_tests = [
     @@ t/t4080-diff-process.sh (new)
      +
      +. ./test-lib.sh
      +
     -+if test_have_prereq PYTHON
     -+then
     -+	PYTHON_PATH=$(command -v python3) || PYTHON_PATH=$(command -v python)
     -+fi
     ++# See t/helper/test-diff-process-backend.c for the backend implementation
     ++# and available --mode= options.
      +
     -+#
     -+# A single parametric diff process.
     -+# Usage: diff-process-backend --mode=<mode> [--log=<path>]
     -+#
     -+# Modes:
     -+#   whole-file  - report all lines as changed (default)
     -+#   fixed-hunk  - always report hunk 5 2 5 2
     -+#   bad-hunk    - report out-of-bounds hunk 999 1 999 1
     -+#   bad-sync    - report hunk with mismatched unchanged totals
     -+#   overlap     - report two overlapping hunks
     -+#   no-hunks   - return no hunks (files considered equivalent)
     -+#   error       - return status=error for every request
     -+#   abort       - return status=abort for every request
     -+#   crash       - read one request then exit without responding
     -+#
     -+setup_backend () {
     -+	cat >"$TRASH_DIRECTORY/diff-process-backend.py" <<-\PYEOF
     -+	import sys, os
     -+
     -+	def read_pkt():
     -+	    hdr = sys.stdin.buffer.read(4)
     -+	    if len(hdr) < 4: return None
     -+	    length = int(hdr, 16)
     -+	    if length == 0: return ""
     -+	    data = sys.stdin.buffer.read(length - 4)
     -+	    return data.decode().rstrip("\n")
     -+
     -+	def write_pkt(line):
     -+	    data = (line + "\n").encode()
     -+	    sys.stdout.buffer.write(f"{len(data)+4:04x}".encode() + data)
     -+	    sys.stdout.buffer.flush()
     -+
     -+	def write_flush():
     -+	    sys.stdout.buffer.write(b"0000")
     -+	    sys.stdout.buffer.flush()
     -+
     -+	def read_content():
     -+	    chunks = []
     -+	    while True:
     -+	        hdr = sys.stdin.buffer.read(4)
     -+	        if len(hdr) < 4: break
     -+	        length = int(hdr, 16)
     -+	        if length == 0: break
     -+	        chunks.append(sys.stdin.buffer.read(length - 4))
     -+	    return b"".join(chunks)
     -+
     -+	mode = "whole-file"
     -+	logfile = None
     -+	for arg in sys.argv[1:]:
     -+	    if arg.startswith("--mode="):
     -+	        mode = arg[7:]
     -+	    elif arg.startswith("--log="):
     -+	        logfile = open(arg[6:], "a")
     -+
     -+	def log(msg):
     -+	    if logfile:
     -+	        logfile.write(msg + "\n")
     -+	        logfile.flush()
     -+
     -+	# Handshake
     -+	assert read_pkt() == "git-diff-client"
     -+	assert read_pkt() == "version=1"
     -+	read_pkt()
     -+	write_pkt("git-diff-server")
     -+	write_pkt("version=1")
     -+	write_flush()
     -+	while True:
     -+	    p = read_pkt()
     -+	    if p == "": break
     -+	write_pkt("capability=hunks")
     -+	write_flush()
     -+
     -+	log("ready")
     -+
     -+	while True:
     -+	    cmd = None
     -+	    pathname = None
     -+	    while True:
     -+	        p = read_pkt()
     -+	        if p is None: sys.exit(0)
     -+	        if p == "": break
     -+	        if p.startswith("command="): cmd = p.split("=",1)[1]
     -+	        if p.startswith("pathname="): pathname = p.split("=",1)[1]
     -+	    if cmd is None: sys.exit(0)
     -+	    old = read_content()
     -+	    new = read_content()
     -+	    old_first = old.split(b"\n")[0].decode(errors="replace") if old else ""
     -+	    new_first = new.split(b"\n")[0].decode(errors="replace") if new else ""
     -+	    log(f"command={cmd} pathname={pathname} old={old_first} new={new_first}")
     -+
     -+	    if mode == "error":
     -+	        write_flush()
     -+	        write_pkt("status=error")
     -+	        write_flush()
     -+	        continue
     -+
     -+	    if mode == "abort":
     -+	        write_flush()
     -+	        write_pkt("status=abort")
     -+	        write_flush()
     -+	        continue
     -+
     -+	    if mode == "crash":
     -+	        sys.exit(1)
     -+
     -+	    if cmd == "hunks":
     -+	        if mode == "fixed-hunk":
     -+	            write_pkt("hunk 5 2 5 2")
     -+	        elif mode == "bad-hunk":
     -+	            write_pkt("hunk 999 1 999 1")
     -+	        elif mode == "bad-sync":
     -+	            write_pkt("hunk 1 2 1 1")
     -+	        elif mode == "overlap":
     -+	            write_pkt("hunk 1 5 1 5")
     -+	            write_pkt("hunk 3 2 3 2")
     -+	        elif mode == "no-hunks":
     -+	            pass
     -+	        else:
     -+	            ol = old.count(b"\n")
     -+	            nl = new.count(b"\n")
     -+	            write_pkt(f"hunk 1 {ol} 1 {nl}")
     -+	        write_flush()
     -+	        write_pkt("status=success")
     -+	        write_flush()
     -+	    else:
     -+	        write_flush()
     -+	        write_pkt("status=error")
     -+	        write_flush()
     -+	PYEOF
     -+	write_script diff-process-backend <<-SHEOF
     -+	exec "$PYTHON_PATH" "$TRASH_DIRECTORY/diff-process-backend.py" "\$@"
     -+	SHEOF
     -+}
     -+
     -+BACKEND="./diff-process-backend"
     ++BACKEND="test-tool diff-process-backend"
      +
     -+test_expect_success PYTHON 'setup' '
     -+	setup_backend &&
     ++test_expect_success 'setup' '
      +	echo "*.c diff=cdiff" >.gitattributes &&
      +	git add .gitattributes &&
      +
     @@ t/t4080-diff-process.sh (new)
      +	git add worddiff.c &&
      +
      +	# newfile.c: single-line function, value changes 42 -> 99.
     -+	# Used by: new file, --exit-code, multiple drivers.
     ++	# Used by: modified file, --exit-code, multiple drivers.
      +	cat >newfile.c <<-\EOF &&
      +	int new_func(void) { return 42; }
      +	EOF
     @@ t/t4080-diff-process.sh (new)
      +# Core behavior: the tool controls which lines are marked as changed.
      +#
      +
     -+test_expect_success PYTHON 'diff process hunk boundaries affect output' '
     ++test_expect_success 'diff process hunk boundaries affect output' '
      +	# The file has changes at lines 5-6 and 9-10, but fixed-hunk
      +	# only reports lines 5-6 as changed.  Lines 9-10 should not
      +	# appear as changed in the output.
     @@ t/t4080-diff-process.sh (new)
      +	test_grep ! "^+NEW10" actual
      +'
      +
     -+test_expect_success PYTHON 'diff process works with new file' '
     -+	rm -f backend.log &&
     ++test_expect_success 'diff process works with modified file' '
     ++	test_when_finished "rm -f backend.log" &&
      +	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
      +		diff -- newfile.c >actual 2>stderr &&
      +	test_grep "return 99" actual &&
     @@ t/t4080-diff-process.sh (new)
      +	test_must_be_empty stderr
      +'
      +
     -+test_expect_success PYTHON 'diff process works with added file (empty old side)' '
     ++test_expect_success 'diff process works with added file (empty old side)' '
      +	cat >added.c <<-\EOF &&
      +	int added(void) { return 1; }
      +	EOF
      +	git add added.c &&
      +
     -+	rm -f backend.log &&
     ++	test_when_finished "rm -f backend.log" &&
      +	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
      +		diff --cached -- added.c >actual 2>stderr &&
      +	test_grep "added" actual &&
     @@ t/t4080-diff-process.sh (new)
      +	test_must_be_empty stderr
      +'
      +
     -+test_expect_success PYTHON 'diff process skipped for binary files' '
     ++test_expect_success 'diff process works with deleted file (empty new side)' '
     ++	git add added.c &&
     ++	git commit -m "commit added.c" &&
     ++	git rm added.c &&
     ++
     ++	test_when_finished "rm -f backend.log" &&
     ++	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
     ++		diff --cached -- added.c >actual 2>stderr &&
     ++	test_grep "deleted file" actual &&
     ++	test_grep "pathname=added.c" backend.log &&
     ++	test_must_be_empty stderr
     ++'
     ++
     ++test_expect_success 'diff process skipped for binary files' '
      +	printf "\\0binary" >binary.c &&
      +	git add binary.c &&
      +	git commit -m "add binary" &&
      +	printf "\\0changed" >binary.c &&
      +
     -+	rm -f backend.log &&
     ++	test_when_finished "rm -f backend.log" &&
      +	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
      +		diff -- binary.c >actual &&
      +	test_grep "Binary files" actual &&
      +	test_path_is_missing backend.log
      +'
      +
     -+test_expect_success PYTHON 'diff process not consulted for unmatched driver' '
     ++test_expect_success 'diff process not consulted for unmatched driver' '
      +	echo "not tracked by cdiff" >unmatched.txt &&
      +	git add unmatched.txt &&
      +	git commit -m "add unmatched.txt" &&
      +
      +	echo "modified" >unmatched.txt &&
      +
     -+	rm -f backend.log &&
     ++	test_when_finished "rm -f backend.log" &&
      +	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
      +		diff -- unmatched.txt >actual &&
      +	test_grep "modified" actual &&
      +	test_path_is_missing backend.log
      +'
      +
     -+test_expect_success PYTHON 'multiple drivers use separate processes' '
     ++test_expect_success 'multiple drivers use separate processes' '
      +	echo "*.h diff=hdiff" >>.gitattributes &&
      +	git add .gitattributes &&
      +
     @@ t/t4080-diff-process.sh (new)
      +	int header(void) { return 2; }
      +	EOF
      +
     -+	rm -f backend-c.log backend-h.log &&
     ++	test_when_finished "rm -f backend-c.log backend-h.log" &&
      +	git -c diff.cdiff.process="$BACKEND --log=backend-c.log" \
      +	    -c diff.hdiff.process="$BACKEND --log=backend-h.log" \
      +		diff -- newfile.c multi.h >actual 2>stderr &&
     @@ t/t4080-diff-process.sh (new)
      +	test_must_be_empty stderr
      +'
      +
     -+test_expect_success PYTHON 'diff process works alongside textconv' '
     ++test_expect_success 'diff process works alongside textconv' '
      +	write_script uppercase-filter <<-\EOF &&
      +	tr "a-z" "A-Z" <"$1"
      +	EOF
     @@ t/t4080-diff-process.sh (new)
      +	goodbye world
      +	EOF
      +
     -+	rm -f backend.log &&
     ++	test_when_finished "rm -f backend.log" &&
      +	git -c diff.cdiff.textconv="./uppercase-filter" \
      +	    -c diff.cdiff.process="$BACKEND --log=backend.log" \
      +		diff -- textconv.c >actual 2>stderr &&
     @@ t/t4080-diff-process.sh (new)
      +# Downstream features: word diff, log, equivalent files, exit code.
      +#
      +
     -+test_expect_success PYTHON 'diff process with --word-diff' '
     -+	rm -f backend.log &&
     ++test_expect_success 'diff process with --word-diff' '
     ++	test_when_finished "rm -f backend.log" &&
      +	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
      +		diff --word-diff worddiff.c >actual 2>stderr &&
      +	test_grep "\[-1;-\]" actual &&
     @@ t/t4080-diff-process.sh (new)
      +	test_must_be_empty stderr
      +'
      +
     -+test_expect_success PYTHON 'diff process works with git log -p' '
     ++test_expect_success 'diff process works with git log -p' '
      +	# With no-hunks mode, the tool says the files are equivalent,
      +	# so log -p should show the commit but no diff content.
     -+	rm -f backend.log &&
     ++	test_when_finished "rm -f backend.log" &&
      +	git -c diff.cdiff.process="$BACKEND --mode=no-hunks --log=backend.log" \
      +		log -1 -p -- logtest.c >actual 2>stderr &&
      +	test_grep "change logtest.c" actual &&
     @@ t/t4080-diff-process.sh (new)
      +	test_must_be_empty stderr
      +'
      +
     -+test_expect_success PYTHON 'diff process no hunks suppresses diff output' '
     ++test_expect_success 'diff process no hunks suppresses diff output' '
      +	cat >nohunks.c <<-\EOF &&
      +	int zero(void) { return 0; }
      +	EOF
     @@ t/t4080-diff-process.sh (new)
      +	test_must_be_empty actual
      +'
      +
     -+test_expect_success PYTHON 'diff process no hunks with --exit-code returns success' '
     ++test_expect_success 'diff process no hunks with --exit-code returns success' '
      +	git -c diff.cdiff.process="$BACKEND --mode=no-hunks" \
      +		diff --exit-code nohunks.c
      +'
      +
     -+test_expect_success PYTHON 'diff process with --exit-code and hunks returns failure' '
     ++test_expect_success 'diff process with --exit-code and hunks returns failure' '
      +	test_expect_code 1 git -c diff.cdiff.process="$BACKEND" \
      +		diff --exit-code newfile.c
      +'
     @@ t/t4080-diff-process.sh (new)
      +# Bypass mechanisms: flags and commands that skip the diff process.
      +#
      +
     -+test_expect_success PYTHON 'diff process bypassed by --diff-algorithm' '
     -+	rm -f backend.log &&
     ++test_expect_success 'diff process bypassed by --diff-algorithm' '
     ++	test_when_finished "rm -f backend.log" &&
      +	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
      +		diff --diff-algorithm=patience worddiff.c >actual &&
      +	test_grep "return 999" actual &&
      +	test_path_is_missing backend.log
      +'
      +
     -+test_expect_success PYTHON 'diff process not used by --stat' '
     -+	rm -f backend.log &&
     ++test_expect_success 'diff process not used by --stat' '
     ++	test_when_finished "rm -f backend.log" &&
      +	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
      +		diff --stat worddiff.c >actual &&
      +	test_grep "worddiff.c" actual &&
     @@ t/t4080-diff-process.sh (new)
      +# Error handling and fallback.
      +#
      +
     -+test_expect_success PYTHON 'diff process fallback on tool error status' '
     -+	rm -f backend.log &&
     ++test_expect_success 'diff process fallback on tool error status' '
     ++	test_when_finished "rm -f backend.log" &&
      +	git -c diff.cdiff.process="$BACKEND --mode=error --log=backend.log" \
      +		diff boundary.c >actual 2>stderr &&
      +	# Fallback produces the full builtin diff (both change regions).
     @@ t/t4080-diff-process.sh (new)
      +	test_grep "diff process.*failed" stderr
      +'
      +
     -+test_expect_success PYTHON 'diff process error keeps tool available for next file' '
     -+	rm -f backend.log &&
     ++test_expect_success 'diff process error keeps tool available for next file' '
     ++	test_when_finished "rm -f backend.log" &&
      +	git -c diff.cdiff.process="$BACKEND --mode=error --log=backend.log" \
      +		diff -- one.c two.c >actual 2>stderr &&
      +	# Unlike abort, error keeps the tool available: both files
     @@ t/t4080-diff-process.sh (new)
      +	test_grep "pathname=one.c" backend.log &&
      +	test_grep "pathname=two.c" backend.log &&
      +	test_grep "return 10" actual &&
     -+	test_grep "return 20" actual
     ++	test_grep "return 20" actual &&
     ++	test_grep "diff process.*failed" stderr
      +'
      +
     -+test_expect_success PYTHON 'diff process abort disables for session' '
     -+	rm -f backend.log &&
     ++test_expect_success 'diff process abort disables for session' '
     ++	test_when_finished "rm -f backend.log" &&
      +	git -c diff.cdiff.process="$BACKEND --mode=abort --log=backend.log" \
     -+		diff -- one.c two.c >actual &&
     ++		diff -- one.c two.c >actual 2>stderr &&
      +	# Both files should still produce diff output via fallback.
      +	test_grep "return 10" actual &&
      +	test_grep "return 20" actual &&
      +	# The tool aborts on the first file and git clears its
      +	# capability.  The second file never contacts the tool.
      +	test_grep "pathname=one.c" backend.log &&
     -+	test_grep ! "pathname=two.c" backend.log
     ++	test_grep ! "pathname=two.c" backend.log &&
     ++	test_must_be_empty stderr
      +'
      +
     -+test_expect_success PYTHON 'diff process fallback on tool crash' '
     ++test_expect_success 'diff process fallback on tool crash' '
      +	git -c diff.cdiff.process="$BACKEND --mode=crash" \
      +		diff boundary.c >actual 2>stderr &&
      +	test_grep "^-OLD5" actual &&
     @@ t/t4080-diff-process.sh (new)
      +	test_grep "diff process.*failed" stderr
      +'
      +
     -+test_expect_success PYTHON 'diff process startup failure only warns once' '
     ++test_expect_success 'diff process startup failure only warns once' '
      +	git -c diff.cdiff.process="/nonexistent/tool" \
      +		diff -- one.c two.c >actual 2>stderr &&
      +	# Both files produce diff output via fallback.
     @@ t/t4080-diff-process.sh (new)
      +	test_line_count = 1 warnings
      +'
      +
     -+test_expect_success PYTHON 'diff process fallback on bad hunks' '
     ++
     ++test_expect_success 'diff process fallback on bad hunks' '
      +	git -c diff.cdiff.process="$BACKEND --mode=bad-hunk" \
      +		diff boundary.c >actual 2>stderr &&
      +	test_grep "^-OLD5" actual &&
      +	test_grep "^+NEW5" actual &&
      +	test_grep "^-OLD9" actual &&
      +	test_grep "^+NEW9" actual &&
     -+	# Invalid hunks are caught by xdiff validation, not the
     -+	# protocol layer, so no warning is emitted.
     -+	test_must_be_empty stderr
     ++	test_grep "exceeds.*lines" stderr
      +'
      +
     -+test_expect_success PYTHON 'diff process fallback on mismatched unchanged totals' '
     ++test_expect_success 'diff process fallback on mismatched unchanged totals' '
      +	cat >synctest.c <<-\EOF &&
      +	line1
      +	line2
     @@ t/t4080-diff-process.sh (new)
      +	# The synchronization invariant fails and git falls back.
      +	git -c diff.cdiff.process="$BACKEND --mode=bad-sync" \
      +		diff synctest.c >actual 2>stderr &&
     -+	test_grep "changed" actual
     ++	test_grep "changed" actual &&
     ++	test_grep "unchanged line count mismatch" stderr
      +'
      +
     -+test_expect_success PYTHON 'diff process fallback on overlapping hunks' '
     ++test_expect_success 'diff process fallback on overlapping hunks' '
      +	# boundary.c has 10 lines, so both hunks are in bounds
      +	# but they overlap at lines 3-5, triggering the ordering check.
      +	git -c diff.cdiff.process="$BACKEND --mode=overlap" \
      +		diff boundary.c >actual 2>stderr &&
     -+	test_grep "NEW5" actual
     ++	test_grep "NEW5" actual &&
     ++	test_grep "overlaps with previous" stderr
     ++'
     ++
     ++test_expect_success 'diff process fallback on malformed hunk line' '
     ++	git -c diff.cdiff.process="$BACKEND --mode=bad-parse" \
     ++		diff boundary.c >actual 2>stderr &&
     ++	test_grep "^-OLD5" actual &&
     ++	test_grep "^+NEW5" actual
     ++'
     ++
     ++test_expect_success 'diff process skipped when tool omits capability' '
     ++	git -c diff.cdiff.process="$BACKEND --mode=no-cap" \
     ++		diff boundary.c >actual 2>stderr &&
     ++	test_grep "^-OLD5" actual &&
     ++	test_grep "^+NEW5" actual &&
     ++	test_must_be_empty stderr
      +'
      +
      +test_done
 5:  f4fd9aa682 ! 5:  6ec6716ea4 diff: bypass diff process with --no-ext-diff and in format-patch
     @@ Commit message
          external tool.
      
          Document that --diff-algorithm also bypasses the diff process,
     -    since it sets ignore_driver_algorithm which diff_process_fill_hunks
     -    already checks.
     +    since it forces the builtin algorithm.
      
          Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
      
     @@ diff.h: struct diff_flags {
       	/**
      
       ## t/t4080-diff-process.sh ##
     -@@ t/t4080-diff-process.sh: test_expect_success PYTHON 'diff process bypassed by --diff-algorithm' '
     +@@ t/t4080-diff-process.sh: test_expect_success 'diff process bypassed by --diff-algorithm' '
       	test_path_is_missing backend.log
       '
       
     -+test_expect_success PYTHON 'diff process bypassed by --no-ext-diff' '
     -+	rm -f backend.log &&
     ++test_expect_success 'diff process bypassed by --no-ext-diff' '
     ++	test_when_finished "rm -f backend.log" &&
      +	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
      +		diff --no-ext-diff worddiff.c >actual &&
      +	test_grep "return 999" actual &&
      +	test_path_is_missing backend.log
      +'
      +
     -+test_expect_success PYTHON 'diff process not used by format-patch' '
     -+	rm -f backend.log &&
     ++test_expect_success 'diff process not used by format-patch' '
     ++	test_when_finished "rm -f backend.log" &&
      +	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
      +		format-patch -1 --stdout -- logtest.c >actual &&
      +	test_grep "return 2" actual &&
      +	test_path_is_missing backend.log
      +'
      +
     - test_expect_success PYTHON 'diff process not used by --stat' '
     - 	rm -f backend.log &&
     + test_expect_success 'diff process not used by --stat' '
     + 	test_when_finished "rm -f backend.log" &&
       	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
 6:  370e766978 ! 6:  3dadafa1bc blame: consult diff process for no-hunk detection
     @@ Commit message
          The consultation happens at the pass_blame_to_parent() callsite
          using diff_process_fill_hunks(), matching how builtin_diff() in
          diff.c uses the same function.  A new diff_hunks_xpp() variant
     -    accepts a pre-populated xpparam_t for this callsite, while the
     -    existing diff_hunks() retains its original signature and behavior.
     -    The copy-detection callsite is unaffected since it does not use
     -    the diff process.
     +    accepts a pre-populated xpparam_t so callers can pass external
     +    hunks, while the existing diff_hunks() retains its original
     +    signature and behavior.  The copy-detection callsite is
     +    unaffected since it does not use the diff process.
      
          The subprocess is long-running (one startup cost amortized
          across the blame traversal), but each commit in the file's
     @@ blame.c: static void pass_blame_to_parent(struct blame_scoreboard *sb,
       		    parent, target, 0);
      
       ## t/t4080-diff-process.sh ##
     -@@ t/t4080-diff-process.sh: test_expect_success PYTHON 'diff process fallback on overlapping hunks' '
     - 	test_grep "NEW5" actual
     +@@ t/t4080-diff-process.sh: test_expect_success 'diff process skipped when tool omits capability' '
     + 	test_must_be_empty stderr
       '
       
      +#
      +# Blame integration.
      +#
      +
     -+test_expect_success PYTHON 'blame uses tool-provided hunks' '
     ++test_expect_success 'blame uses tool-provided hunks' '
      +	cat >blame-hunk.c <<-\EOF &&
      +	line1
      +	line2
     @@ t/t4080-diff-process.sh: test_expect_success PYTHON 'diff process fallback on ov
      +	test_grep "$CHANGE" line6
      +'
      +
     -+test_expect_success PYTHON 'blame skips commits with no hunks from diff process' '
     ++test_expect_success 'blame skips commits with no hunks from diff process' '
      +	cat >blame.c <<-\EOF &&
     -+	int main(void)
     -+	{
     -+	    return 0;
     ++	int main(void) {
     ++	return 0;
      +	}
      +	EOF
      +	git add blame.c &&
     @@ t/t4080-diff-process.sh: test_expect_success PYTHON 'diff process fallback on ov
      +	cat >blame.c <<-\EOF &&
      +	int main(void)
      +	{
     -+	        return 0;
     ++	return 0;
      +	}
      +	EOF
      +	git add blame.c &&
     @@ t/t4080-diff-process.sh: test_expect_success PYTHON 'diff process fallback on ov
      +	test_grep "$ORIG_COMMIT" with
      +'
      +
     -+test_expect_success PYTHON 'blame --no-ext-diff bypasses diff process' '
     -+	rm -f backend.log &&
     ++test_expect_success 'blame --no-ext-diff bypasses diff process' '
     ++	test_when_finished "rm -f backend.log" &&
      +	git -c diff.cdiff.process="$BACKEND --mode=no-hunks --log=backend.log" \
      +		blame --no-ext-diff blame.c >actual &&
      +	# Without the process, blame attributes the reformat commit normally.
     @@ t/t4080-diff-process.sh: test_expect_success PYTHON 'diff process fallback on ov
      +	test_path_is_missing backend.log
      +'
      +
     -+test_expect_success PYTHON 'blame --no-ext-diff uses builtin hunks' '
     ++test_expect_success 'blame --no-ext-diff uses builtin hunks' '
      +	# fixed-hunk mode would narrow blame to lines 5-6, but
      +	# --no-ext-diff should bypass it and use the builtin diff.
     -+	rm -f backend.log &&
     ++	test_when_finished "rm -f backend.log" &&
      +	git -c diff.cdiff.process="$BACKEND --mode=fixed-hunk --log=backend.log" \
      +		blame --no-ext-diff blame-hunk.c >actual &&
      +	# Builtin diff attributes lines 9-10 to the change commit.

-- 
gitgitgadget

^ permalink raw reply

* [PATCH v4 1/6] xdiff: support external hunks via xpparam_t
From: Michael Montalbo via GitGitGadget @ 2026-06-14 18:59 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Michael Montalbo, Michael Montalbo
In-Reply-To: <pull.2120.v4.git.1781463564.gitgitgadget@gmail.com>

From: Michael Montalbo <mmontalbo@gmail.com>

Add two new xpparam_t fields (external_hunks, external_hunks_nr)
that let callers supply pre-computed hunks.  When set, xdl_diff()
populates the changed[] arrays from these hunks instead of running
the diff algorithm, then continues through compaction and emission
as usual.

Validate supplied hunks before use: reject out-of-bounds line
numbers, overlapping or out-of-order hunks, negative counts, and
violations of the synchronization invariant (unchanged line counts
must match between files).  On validation failure, fall back to
the builtin diff algorithm; this re-runs xdl_prepare_env() since
the first call may have dirtied the changed[] arrays.

Skip trim_common_tail() in xdi_diff() when external hunks are
present, since external hunks reference line numbers in the
original content.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
---
 xdiff-interface.c |   7 ++-
 xdiff/xdiff.h     |  14 ++++++
 xdiff/xdiffi.c    | 123 +++++++++++++++++++++++++++++++++++++++++++++-
 xdiff/xprepare.c  |  10 ++++
 xdiff/xprepare.h  |   1 +
 5 files changed, 152 insertions(+), 3 deletions(-)

diff --git a/xdiff-interface.c b/xdiff-interface.c
index 5ee2b96d0a..76a24fc589 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -124,7 +124,12 @@ int xdi_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, xdemitconf_t co
 	if (mf1->size > MAX_XDIFF_SIZE || mf2->size > MAX_XDIFF_SIZE)
 		return -1;
 
-	if (!xecfg->ctxlen && !(xecfg->flags & XDL_EMIT_FUNCCONTEXT))
+	/*
+	 * External hunks reference line numbers in the original content;
+	 * trimming the tail would change line counts and invalidate them.
+	 */
+	if (!xpp->external_hunks &&
+	    !xecfg->ctxlen && !(xecfg->flags & XDL_EMIT_FUNCCONTEXT))
 		trim_common_tail(&a, &b);
 
 	return xdl_diff(&a, &b, xpp, xecfg, xecb);
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index dc370712e9..dd4915fe16 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -78,6 +78,16 @@ typedef struct s_mmbuffer {
 	long size;
 } mmbuffer_t;
 
+/*
+ * Hunk descriptor for externally computed diffs.
+ * Line numbers are 1-based; a start of 0 is accepted when
+ * count is 0 (empty file side, matching git diff output).
+ */
+struct xdl_hunk {
+	long old_start, old_count;
+	long new_start, new_count;
+};
+
 typedef struct s_xpparam {
 	unsigned long flags;
 
@@ -88,6 +98,10 @@ typedef struct s_xpparam {
 	/* See Documentation/diff-options.adoc. */
 	char **anchors;
 	size_t anchors_nr;
+
+	/* Externally computed hunks: bypass the diff algorithm.  Owned by caller. */
+	struct xdl_hunk *external_hunks;
+	size_t external_hunks_nr;
 } xpparam_t;
 
 typedef struct s_xdemitcb {
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index c5a892f91e..bf820b52e3 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -1085,16 +1085,135 @@ static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe,
 	}
 }
 
+/*
+ * Populate the changed[] arrays from externally supplied hunks,
+ * bypassing the diff algorithm.  Validates that hunks are in order,
+ * non-overlapping, and within bounds.
+ *
+ * Returns 0 on success, -1 on validation failure.
+ */
+static int xdl_populate_hunks_from_external(xdfenv_t *xe,
+					    struct xdl_hunk *hunks,
+					    size_t nr_hunks)
+{
+	size_t i;
+	long j, prev_old_end = 0, prev_new_end = 0;
+	long changed_old = 0, changed_new = 0;
+
+	/*
+	 * xdl_prepare_env() may dirty changed[] via xdl_cleanup_records().
+	 * Clear them so only the external hunks are marked.
+	 */
+	xdl_clear_changed(&xe->xdf1);
+	xdl_clear_changed(&xe->xdf2);
+
+	for (i = 0; i < nr_hunks; i++) {
+		struct xdl_hunk *h = &hunks[i];
+
+		if (h->old_count < 0 || h->new_count < 0) {
+			warning("diff process hunk %"PRIuMAX": "
+				"negative count (old=%ld, new=%ld)",
+				(uintmax_t)(i + 1),
+				h->old_count, h->new_count);
+			return -1;
+		}
+		if (h->old_start < 1 || h->new_start < 1) {
+			warning("diff process hunk %"PRIuMAX": "
+				"start must be >= 1 (old=%ld, new=%ld)",
+				(uintmax_t)(i + 1),
+				h->old_start, h->new_start);
+			return -1;
+		}
+
+		/*
+		 * Range must fit: start + count - 1 <= nrec,
+		 * rewritten to avoid overflow.  Same for both sides.
+		 *
+		 * When count is 0 (pure insert/delete) the check
+		 * reduces to 0 > nrec - start + 1, which rejects
+		 * start > nrec + 1 and allows start == nrec + 1
+		 * (the position after the last line).
+		 */
+		if (h->old_count > (long)xe->xdf1.nrec - h->old_start + 1) {
+			warning("diff process hunk %"PRIuMAX": "
+				"old range %ld+%ld exceeds %lu lines",
+				(uintmax_t)(i + 1),
+				h->old_start, h->old_count,
+				(unsigned long)xe->xdf1.nrec);
+			return -1;
+		}
+		if (h->new_count > (long)xe->xdf2.nrec - h->new_start + 1) {
+			warning("diff process hunk %"PRIuMAX": "
+				"new range %ld+%ld exceeds %lu lines",
+				(uintmax_t)(i + 1),
+				h->new_start, h->new_count,
+				(unsigned long)xe->xdf2.nrec);
+			return -1;
+		}
+
+		/* Ordering: no overlap with previous hunk (adjacent is OK) */
+		if (h->old_start < prev_old_end ||
+		    h->new_start < prev_new_end) {
+			warning("diff process hunk %"PRIuMAX": "
+				"overlaps with previous hunk",
+				(uintmax_t)(i + 1));
+			return -1;
+		}
+
+		for (j = 0; j < h->old_count; j++)
+			xe->xdf1.changed[h->old_start - 1 + j] = true;
+		for (j = 0; j < h->new_count; j++)
+			xe->xdf2.changed[h->new_start - 1 + j] = true;
+
+		prev_old_end = h->old_start + h->old_count;
+		prev_new_end = h->new_start + h->new_count;
+	}
+
+	/*
+	 * Synchronization invariant: unchanged line counts must match.
+	 * Otherwise xdl_build_script() would walk off one array.
+	 *
+	 * Count changed lines from the arrays rather than accumulating
+	 * during the loop to avoid any overflow in the summation.
+	 */
+	for (j = 0; j < (long)xe->xdf1.nrec; j++)
+		if (xe->xdf1.changed[j])
+			changed_old++;
+	for (j = 0; j < (long)xe->xdf2.nrec; j++)
+		if (xe->xdf2.changed[j])
+			changed_new++;
+	if ((long)xe->xdf1.nrec - changed_old !=
+	    (long)xe->xdf2.nrec - changed_new) {
+		warning("diff process: unchanged line count mismatch "
+			"(old: %ld unchanged, new: %ld unchanged)",
+			(long)xe->xdf1.nrec - changed_old,
+			(long)xe->xdf2.nrec - changed_new);
+		return -1;
+	}
+
+	return 0;
+}
+
 int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 	     xdemitconf_t const *xecfg, xdemitcb_t *ecb) {
 	xdchange_t *xscr;
 	xdfenv_t xe;
 	emit_func_t ef = xecfg->hunk_func ? xdl_call_hunk_func : xdl_emit_diff;
 
-	if (xdl_do_diff(mf1, mf2, xpp, &xe) < 0) {
+	if (xpp->external_hunks) {
+		if (xdl_prepare_env(mf1, mf2, xpp, &xe) < 0)
+			return -1;
+		if (xdl_populate_hunks_from_external(&xe,
+						     xpp->external_hunks,
+						     xpp->external_hunks_nr) == 0)
+			goto diff_done;
+		xdl_free_env(&xe);
+	}
 
+	if (xdl_do_diff(mf1, mf2, xpp, &xe) < 0)
 		return -1;
-	}
+
+diff_done:
 	if (xdl_change_compact(&xe.xdf1, &xe.xdf2, xpp->flags) < 0 ||
 	    xdl_change_compact(&xe.xdf2, &xe.xdf1, xpp->flags) < 0 ||
 	    xdl_build_script(&xe, &xscr) < 0) {
diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
index 11bada2608..f4ab935332 100644
--- a/xdiff/xprepare.c
+++ b/xdiff/xprepare.c
@@ -471,3 +471,13 @@ int xdl_prepare_env(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 
 	return 0;
 }
+
+/*
+ * Reset the changed[] array so that no lines are marked as changed.
+ * Also clears the sentinel slots at changed[-1] and changed[nrec]
+ * that xdl_change_compact() relies on during backward scans.
+ */
+void xdl_clear_changed(xdfile_t *xdf)
+{
+	memset(xdf->changed - 1, 0, (xdf->nrec + 2) * sizeof(bool));
+}
diff --git a/xdiff/xprepare.h b/xdiff/xprepare.h
index 947d9fc1bb..0413baf07b 100644
--- a/xdiff/xprepare.h
+++ b/xdiff/xprepare.h
@@ -28,6 +28,7 @@
 int xdl_prepare_env(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 		    xdfenv_t *xe);
 void xdl_free_env(xdfenv_t *xe);
+void xdl_clear_changed(xdfile_t *xdf);
 
 
 
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v4 2/6] userdiff: add diff.<driver>.process config
From: Michael Montalbo via GitGitGadget @ 2026-06-14 18:59 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Michael Montalbo, Michael Montalbo
In-Reply-To: <pull.2120.v4.git.1781463564.gitgitgadget@gmail.com>

From: Michael Montalbo <mmontalbo@gmail.com>

Add the process field to struct userdiff_driver and teach the
config parser to populate it from diff.<driver>.process.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
---
 userdiff.c | 7 +++++++
 userdiff.h | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/userdiff.c b/userdiff.c
index b5412e6bc3..7547874aa2 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -509,6 +509,13 @@ int userdiff_config(const char *k, const char *v)
 		drv->algorithm = drv->algorithm_owned;
 		return ret;
 	}
+	if (!strcmp(type, "process")) {
+		int ret;
+		FREE_AND_NULL(drv->process_owned);
+		ret = git_config_string(&drv->process_owned, k, v);
+		drv->process = drv->process_owned;
+		return ret;
+	}
 
 	return 0;
 }
diff --git a/userdiff.h b/userdiff.h
index 827361b0bc..51c26e0d41 100644
--- a/userdiff.h
+++ b/userdiff.h
@@ -31,6 +31,8 @@ struct userdiff_driver {
 	char *textconv_owned;
 	struct notes_cache *textconv_cache;
 	int textconv_want_cache;
+	const char *process;
+	char *process_owned;
 };
 enum userdiff_driver_type {
 	USERDIFF_DRIVER_TYPE_BUILTIN = 1<<0,
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v4 3/6] sub-process: separate process lifecycle from hashmap management
From: Michael Montalbo via GitGitGadget @ 2026-06-14 18:59 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Michael Montalbo, Michael Montalbo
In-Reply-To: <pull.2120.v4.git.1781463564.gitgitgadget@gmail.com>

From: Michael Montalbo <mmontalbo@gmail.com>

subprocess_start() and subprocess_stop() couple two concerns:
managing a child process (setup, handshake, teardown) and
managing a hashmap that indexes running processes by command
string.  The hashmap suits callers like convert.c where many
files may share one filter process looked up by name, but
callers that manage process lifetime through their own data
structures do not need it.

Extract subprocess_start_command() and subprocess_stop_command()
so callers can reuse the child process setup and handshake
machinery without maintaining a hashmap.  subprocess_start()
and subprocess_stop() become thin wrappers that add hashmap
operations on top.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
---
 sub-process.c | 28 +++++++++++++++++++++++-----
 sub-process.h |  9 ++++++++-
 2 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/sub-process.c b/sub-process.c
index 83bf0a0e82..5468939338 100644
--- a/sub-process.c
+++ b/sub-process.c
@@ -49,7 +49,7 @@ int subprocess_read_status(int fd, struct strbuf *status)
 	return (len < 0) ? len : 0;
 }
 
-void subprocess_stop(struct hashmap *hashmap, struct subprocess_entry *entry)
+void subprocess_stop_command(struct subprocess_entry *entry)
 {
 	if (!entry)
 		return;
@@ -57,7 +57,14 @@ void subprocess_stop(struct hashmap *hashmap, struct subprocess_entry *entry)
 	entry->process.clean_on_exit = 0;
 	kill(entry->process.pid, SIGTERM);
 	finish_command(&entry->process);
+}
 
+void subprocess_stop(struct hashmap *hashmap, struct subprocess_entry *entry)
+{
+	if (!entry)
+		return;
+
+	subprocess_stop_command(entry);
 	hashmap_remove(hashmap, &entry->ent, NULL);
 }
 
@@ -72,7 +79,7 @@ static void subprocess_exit_handler(struct child_process *process)
 	finish_command(process);
 }
 
-int subprocess_start(struct hashmap *hashmap, struct subprocess_entry *entry, const char *cmd,
+int subprocess_start_command(struct subprocess_entry *entry, const char *cmd,
 	subprocess_start_fn startfn)
 {
 	int err;
@@ -96,15 +103,26 @@ int subprocess_start(struct hashmap *hashmap, struct subprocess_entry *entry, co
 		return err;
 	}
 
-	hashmap_entry_init(&entry->ent, strhash(cmd));
-
 	err = startfn(entry);
 	if (err) {
 		error("initialization for subprocess '%s' failed", cmd);
-		subprocess_stop(hashmap, entry);
+		subprocess_stop_command(entry);
 		return err;
 	}
 
+	return 0;
+}
+
+int subprocess_start(struct hashmap *hashmap, struct subprocess_entry *entry, const char *cmd,
+	subprocess_start_fn startfn)
+{
+	int err;
+
+	err = subprocess_start_command(entry, cmd, startfn);
+	if (err)
+		return err;
+
+	hashmap_entry_init(&entry->ent, strhash(cmd));
 	hashmap_add(hashmap, &entry->ent);
 	return 0;
 }
diff --git a/sub-process.h b/sub-process.h
index bfc3959a1b..45f1b8e5e3 100644
--- a/sub-process.h
+++ b/sub-process.h
@@ -52,10 +52,17 @@ int cmd2process_cmp(const void *unused_cmp_data,
  */
 typedef int(*subprocess_start_fn)(struct subprocess_entry *entry);
 
-/* Start a subprocess and add it to the subprocess hashmap. */
+/* Start a subprocess and run the startfn (typically handshake). */
+int subprocess_start_command(struct subprocess_entry *entry, const char *cmd,
+		subprocess_start_fn startfn);
+
+/* Start a subprocess, run startfn, and add it to the subprocess hashmap. */
 int subprocess_start(struct hashmap *hashmap, struct subprocess_entry *entry, const char *cmd,
 		subprocess_start_fn startfn);
 
+/* Kill a subprocess. */
+void subprocess_stop_command(struct subprocess_entry *entry);
+
 /* Kill a subprocess and remove it from the subprocess hashmap. */
 void subprocess_stop(struct hashmap *hashmap, struct subprocess_entry *entry);
 
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v4 4/6] diff: add long-running diff process via diff.<driver>.process
From: Michael Montalbo via GitGitGadget @ 2026-06-14 18:59 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Michael Montalbo, Michael Montalbo
In-Reply-To: <pull.2120.v4.git.1781463564.gitgitgadget@gmail.com>

From: Michael Montalbo <mmontalbo@gmail.com>

Add support for external diff processes that communicate via the
long-running process protocol (pkt-line over stdin/stdout).

A diff process is configured per userdiff driver:

    [diff "cdiff"]
        process = /path/to/diff-tool

The tool provides custom line-matching: it receives file pairs
and returns hunks that reference line numbers in the content.
When textconv is also configured, the tool receives the
textconv-transformed content.  The tool controls which lines
are marked as changed while the display shows the file content.
Patch output features (word diff, function context, color) work
normally; --stat uses its own diff codepath and never consults
the diff process.

The handshake negotiates version=1 and capability=hunks.  Per-file
requests send command=hunks, pathname, and both file contents as
packetized data.  The tool responds with hunk lines and a status
packet (success, error, or abort).  On error, Git warns and falls
back to the builtin diff algorithm for that file.  On abort, Git
silently falls back for the current file and stops sending further
requests to the tool for the remainder of the session.

When the tool returns no hunks followed by status=success, Git
treats the file as having no changes and produces no diff output.
This also means --exit-code reports no changes for that file.

The subprocess is stored on the userdiff_driver struct and
launched on first use.  If the process fails to start, the
handshake fails, or a communication error occurs mid-stream,
the failure is cached on the driver to avoid retrying and
re-warning on every subsequent file.

diff_process_fill_hunks() is the sole public entry point.  It
handles driver lookup, flag checks, subprocess management, and
error reporting, returning an enum that lets callers distinguish
"hunks populated" from "files equivalent" from "not applicable"
from "tool failure."

Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
---
 Documentation/config/diff.adoc       |   5 +
 Documentation/gitattributes.adoc     | 143 +++++++++
 Makefile                             |   2 +
 diff-process.c                       | 297 ++++++++++++++++++
 diff-process.h                       |  39 +++
 diff.c                               |  13 +
 diff.h                               |   3 +
 meson.build                          |   1 +
 t/helper/meson.build                 |   1 +
 t/helper/test-diff-process-backend.c | 299 ++++++++++++++++++
 t/helper/test-tool.c                 |   1 +
 t/helper/test-tool.h                 |   1 +
 t/meson.build                        |   1 +
 t/t4080-diff-process.sh              | 432 +++++++++++++++++++++++++++
 userdiff.h                           |   3 +
 15 files changed, 1241 insertions(+)
 create mode 100644 diff-process.c
 create mode 100644 diff-process.h
 create mode 100644 t/helper/test-diff-process-backend.c
 create mode 100755 t/t4080-diff-process.sh

diff --git a/Documentation/config/diff.adoc b/Documentation/config/diff.adoc
index 1135a62a0a..ac0635bb3b 100644
--- a/Documentation/config/diff.adoc
+++ b/Documentation/config/diff.adoc
@@ -218,6 +218,11 @@ endif::git-diff[]
 	Set this option to `true` to make the diff driver cache the text
 	conversion outputs.  See linkgit:gitattributes[5] for details.
 
+`diff.<driver>.process`::
+	The command to run as a long-running diff process that
+	provides hunks to Git's diff pipeline.
+	See linkgit:gitattributes[5] for details.
+
 `diff.indentHeuristic`::
 	Set this option to `false` to disable the default heuristics
 	that shift diff hunk boundaries to make patches easier to read.
diff --git a/Documentation/gitattributes.adoc b/Documentation/gitattributes.adoc
index bd76167a45..49ed11d069 100644
--- a/Documentation/gitattributes.adoc
+++ b/Documentation/gitattributes.adoc
@@ -821,6 +821,149 @@ NOTE: If `diff.<name>.command` is defined for path with the
 (see above), and adding `diff.<name>.algorithm` has no effect, as the
 algorithm is not passed to the external diff driver.
 
+Using an external diff process
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+If `diff.<name>.process` is defined, Git sends the old and new file
+content to an external tool and receives back a list of changed
+regions (pairs of line ranges in the old and new file).  Git uses
+these instead of its builtin diff algorithm, but still controls
+all output formatting, so features like word diff, function context,
+color, and blame work normally.  This is achieved by using the
+long-running process protocol (described in
+Documentation/technical/long-running-process-protocol.adoc).
+Unlike `diff.<name>.command`, which replaces Git's output entirely,
+the diff process feeds results back into the standard pipeline.
+
+First, in `.gitattributes`, assign the `diff` attribute for paths.
+
+------------------------
+*.c diff=cdiff
+------------------------
+
+Then, define a "diff.<name>.process" configuration to specify
+the diff process command.
+
+----------------------------------------------------------------
+[diff "cdiff"]
+  process = /path/to/diff-process-tool
+----------------------------------------------------------------
+
+When Git encounters the first file that needs to be diffed, it starts
+the process and performs the handshake.  In the handshake, the welcome
+message sent by Git is "git-diff-client", only version 1 is supported,
+and the supported capability is "hunks" (the changed regions
+described below).
+
+For each file, Git sends a list of "key=value" pairs terminated with
+a flush packet, followed by the old and new file content as packetized
+data, each terminated with a flush packet.  The pathname is relative
+to the repository root.  When `diff.<name>.textconv` is also set,
+the tool receives the textconv-transformed content rather than the
+raw blob.  Git does not send binary files to the diff process.
+
+-----------------------
+packet:          git> command=hunks
+packet:          git> pathname=path/file.c
+packet:          git> 0000
+packet:          git> OLD_CONTENT
+packet:          git> 0000
+packet:          git> NEW_CONTENT
+packet:          git> 0000
+-----------------------
+
+The tool is expected to respond with zero or more hunk lines,
+a flush packet, and a status packet terminated with a flush packet.
+Each hunk line has the form:
+
+  `hunk <old_start> <old_count> <new_start> <new_count>`
+
+where `<old_start>` and `<old_count>` identify a range of lines in
+the old file, and `<new_start>` and `<new_count>` identify the
+replacement range in the new file.  Start values are 1-based and
+counts are non-negative.  Ranges must not extend beyond the end of
+the file.  For example, `hunk 3 2 3 4` means that 2 lines starting
+at line 3 in the old file were replaced by 4 lines starting at
+line 3 in the new file.  An `<old_count>` of 0 means no lines were
+removed (pure insertion); a `<new_count>` of 0 means no lines were
+added (pure deletion).  A start value of 0 is accepted when
+the corresponding count is 0 (e.g., `hunk 0 0 1 5` for a newly
+added file), matching what `git diff` itself emits for empty
+file sides.
+
+Lines are delimited by newlines.  A file `"foo\nbar\n"` and a
+file `"foo\nbar"` both have 2 lines.
+
+Hunks must be listed in order and must not overlap.  Any line
+not covered by a hunk is treated as unchanged, so the total
+number of unchanged lines must be the same on both sides.
+For example, if the old file has 10 lines and the hunks cover
+4 of them (`old_count` values summing to 4), then 6 old lines
+are unchanged.  The new file must also have exactly 6 lines
+not covered by hunks, so the `new_count` values must sum to
+`new_file_lines - 6`.
+
+-----------------------
+packet:          git< hunk 1 3 1 5
+packet:          git< hunk 10 2 12 2
+packet:          git< 0000
+packet:          git< status=success
+packet:          git< 0000
+-----------------------
+
+If the tool responds with hunks and "success", Git marks those lines
+as changed and feeds them into the standard diff pipeline.  Patch
+output features (word diff, function context, color) work normally.
+Note that `--stat` and other summary formats use their own diff path
+and are not affected by the diff process.
+
+If no hunk lines precede the flush, followed by "success", Git
+treats the files as having no changes: `git diff` produces no output
+and `git blame` skips the commit, attributing lines to earlier commits.
+
+-----------------------
+packet:          git< 0000
+packet:          git< status=success
+packet:          git< 0000
+-----------------------
+
+If the tool returns invalid hunks (out of bounds, overlapping, or
+mismatched unchanged line counts), Git warns and falls back to the
+builtin diff algorithm.
+
+In case the tool cannot or does not want to process the content,
+it is expected to respond with an "error" status.  Git warns and
+falls back to the builtin diff algorithm for this file.  The tool
+remains available for subsequent files.
+
+-----------------------
+packet:          git< 0000
+packet:          git< status=error
+packet:          git< 0000
+-----------------------
+
+In case the tool cannot or does not want to process the content as
+well as any future content for the lifetime of the Git process, it
+is expected to respond with an "abort" status.  Git silently falls
+back to the builtin diff algorithm for this file and does not send
+further requests to the tool.
+
+-----------------------
+packet:          git< 0000
+packet:          git< status=abort
+packet:          git< 0000
+-----------------------
+
+If the tool dies during the communication or does not adhere to the
+protocol then Git will stop the process and fall back to the builtin
+diff algorithm.  Git warns once and does not restart the process for
+subsequent files.
+
+Tools should ignore unknown keys in the per-file request to remain
+forward-compatible.  Future versions of Git may send additional
+`command=` values; tools that receive an unrecognized command should
+respond with `status=error` rather than terminating.
+
 Defining a custom hunk-header
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
diff --git a/Makefile b/Makefile
index 0976a69b4c..5de4718b24 100644
--- a/Makefile
+++ b/Makefile
@@ -811,6 +811,7 @@ TEST_BUILTINS_OBJS += test-csprng.o
 TEST_BUILTINS_OBJS += test-date.o
 TEST_BUILTINS_OBJS += test-delete-gpgsig.o
 TEST_BUILTINS_OBJS += test-delta.o
+TEST_BUILTINS_OBJS += test-diff-process-backend.o
 TEST_BUILTINS_OBJS += test-dir-iterator.o
 TEST_BUILTINS_OBJS += test-drop-caches.o
 TEST_BUILTINS_OBJS += test-dump-cache-tree.o
@@ -1140,6 +1141,7 @@ LIB_OBJS += diff-delta.o
 LIB_OBJS += diff-merges.o
 LIB_OBJS += diff-lib.o
 LIB_OBJS += diff-no-index.o
+LIB_OBJS += diff-process.o
 LIB_OBJS += diff.o
 LIB_OBJS += diffcore-break.o
 LIB_OBJS += diffcore-delta.o
diff --git a/diff-process.c b/diff-process.c
new file mode 100644
index 0000000000..fec63cf233
--- /dev/null
+++ b/diff-process.c
@@ -0,0 +1,297 @@
+/*
+ * Diff process backend: communicates with a long-running external
+ * tool via the pkt-line protocol to obtain custom line-matching
+ * results.  The tool controls which lines are marked as changed
+ * while the display shows the file content (after any textconv
+ * transformation, if configured).
+ *
+ * Protocol: pkt-line over stdin/stdout, following the pattern of
+ * the long-running filter process protocol (see convert.c).
+ *
+ * Handshake:
+ *   git> git-diff-client / version=1 / flush
+ *   tool< git-diff-server / version=1 / flush
+ *   git> capability=hunks / flush
+ *   tool< capability=hunks / flush
+ *
+ * Per-file:
+ *   git> command=hunks / pathname=<path> / flush
+ *   git> <old content packetized> / flush
+ *   git> <new content packetized> / flush
+ *   tool< hunk <old_start> <old_count> <new_start> <new_count>
+ *   tool< ... / flush
+ *   tool< status=success / flush
+ *
+ * When the tool returns no hunks with status=success, it considers
+ * the files equivalent.  Git will skip the diff for that file.
+ */
+
+#include "git-compat-util.h"
+#include "diff-process.h"
+#include "diff.h"
+#include "gettext.h"
+#include "repository.h"
+#include "sigchain.h"
+#include "userdiff.h"
+#include "sub-process.h"
+#include "pkt-line.h"
+#include "strbuf.h"
+#include "xdiff/xdiff.h"
+
+#define CAP_HUNKS (1u << 0)
+
+struct diff_subprocess {
+	struct subprocess_entry subprocess;
+	unsigned int supported_capabilities;
+};
+
+static int start_diff_process_fn(struct subprocess_entry *subprocess)
+{
+	static int versions[] = { 1, 0 };
+	static struct subprocess_capability capabilities[] = {
+		{ "hunks", CAP_HUNKS },
+		{ NULL, 0 }
+	};
+	struct diff_subprocess *entry =
+		container_of(subprocess, struct diff_subprocess, subprocess);
+
+	return subprocess_handshake(subprocess, "git-diff",
+				    versions, NULL,
+				    capabilities,
+				    &entry->supported_capabilities);
+}
+
+static struct diff_subprocess *get_or_launch_process(
+		struct userdiff_driver *drv)
+{
+	struct diff_subprocess *entry;
+
+	if (drv->diff_subprocess)
+		return drv->diff_subprocess;
+
+	entry = xcalloc(1, sizeof(*entry));
+	if (subprocess_start_command(&entry->subprocess, drv->process,
+				     start_diff_process_fn)) {
+		free(entry);
+		drv->diff_process_failed = 1;
+		return NULL;
+	}
+
+	drv->diff_subprocess = entry;
+	return entry;
+}
+
+static int send_file_content(int fd, const char *buf, long size)
+{
+	int ret = 0;
+
+	if (size < 0)
+		return -1;
+	if (size > 0)
+		ret = write_packetized_from_buf_no_flush(buf, size, fd);
+	if (ret)
+		return ret;
+	return packet_flush_gently(fd);
+}
+
+static int parse_hunk_line(const char *line, struct xdl_hunk *hunk)
+{
+	char *end;
+
+	/*
+	 * Format: "hunk <old_start> <old_count> <new_start> <new_count>"
+	 * All numbers must be non-negative decimal with no leading
+	 * whitespace or sign characters.
+	 */
+	if (!skip_prefix(line, "hunk ", &line))
+		return -1;
+
+	if (!isdigit(*line))
+		return -1;
+	errno = 0;
+	hunk->old_start = strtol(line, &end, 10);
+	if (errno || end == line || *end++ != ' ')
+		return -1;
+	line = end;
+
+	if (!isdigit(*line))
+		return -1;
+	errno = 0;
+	hunk->old_count = strtol(line, &end, 10);
+	if (errno || end == line || *end++ != ' ')
+		return -1;
+	line = end;
+
+	if (!isdigit(*line))
+		return -1;
+	errno = 0;
+	hunk->new_start = strtol(line, &end, 10);
+	if (errno || end == line || *end++ != ' ')
+		return -1;
+	line = end;
+
+	if (!isdigit(*line))
+		return -1;
+	errno = 0;
+	hunk->new_count = strtol(line, &end, 10);
+	if (errno || end == line || *end != '\0')
+		return -1;
+
+	/*
+	 * git diff emits start=0 when count=0 (empty file side).
+	 * Normalize to 1-based so downstream validation can assume start >= 1.
+	 */
+	if (!hunk->old_count && !hunk->old_start)
+		hunk->old_start = 1;
+	if (!hunk->new_count && !hunk->new_start)
+		hunk->new_start = 1;
+
+	return 0;
+}
+
+static enum diff_process_result get_hunks(
+		struct userdiff_driver *drv,
+		const char *path,
+		const char *old_buf, long old_size,
+		const char *new_buf, long new_size,
+		struct xdl_hunk **hunks_out,
+		size_t *nr_hunks_out)
+{
+	struct diff_subprocess *backend;
+	struct child_process *process;
+	int fd_in, fd_out;
+	struct strbuf status = STRBUF_INIT;
+	struct xdl_hunk *hunks = NULL;
+	struct xdl_hunk hunk;
+	size_t nr_hunks = 0, alloc_hunks = 0;
+	int len;
+	char *line;
+
+	backend = get_or_launch_process(drv);
+	if (!backend)
+		return DIFF_PROCESS_ERROR;
+
+	if (!(backend->supported_capabilities & CAP_HUNKS))
+		return DIFF_PROCESS_SKIP;
+
+	process = subprocess_get_child_process(&backend->subprocess);
+	fd_in = process->in;
+	fd_out = process->out;
+
+	sigchain_push(SIGPIPE, SIG_IGN);
+
+	/* Send request */
+	if (packet_write_fmt_gently(fd_in, "command=hunks\n") ||
+	    packet_write_fmt_gently(fd_in, "pathname=%s\n", path) ||
+	    packet_flush_gently(fd_in))
+		goto comm_error;
+
+	/* Send old file content */
+	if (send_file_content(fd_in, old_buf, old_size))
+		goto comm_error;
+
+	/* Send new file content */
+	if (send_file_content(fd_in, new_buf, new_size))
+		goto comm_error;
+
+	/* Read hunks until flush packet */
+	while ((len = packet_read_line_gently(fd_out, NULL, &line)) >= 0 &&
+	       line) {
+		if (parse_hunk_line(line, &hunk) < 0)
+			goto comm_error;
+		ALLOC_GROW(hunks, nr_hunks + 1, alloc_hunks);
+		hunks[nr_hunks++] = hunk;
+	}
+	if (len < 0)
+		goto comm_error;
+
+	/* Read status */
+	if (subprocess_read_status(fd_out, &status))
+		goto comm_error;
+
+	if (!strcmp(status.buf, "success")) {
+		*hunks_out = hunks;
+		*nr_hunks_out = nr_hunks;
+		strbuf_release(&status);
+		sigchain_pop(SIGPIPE);
+		return DIFF_PROCESS_OK;
+	}
+
+	if (!strcmp(status.buf, "abort")) {
+		/*
+		 * The tool voluntarily withdrew: stop sending requests
+		 * but do not warn (this is not a failure).
+		 */
+		backend->supported_capabilities &= ~CAP_HUNKS;
+		free(hunks);
+		strbuf_release(&status);
+		sigchain_pop(SIGPIPE);
+		return DIFF_PROCESS_SKIP;
+	}
+
+	/* status=error or unknown status */
+	free(hunks);
+	strbuf_release(&status);
+	sigchain_pop(SIGPIPE);
+	return DIFF_PROCESS_ERROR;
+
+comm_error:
+	/*
+	 * Communication failure (broken pipe, malformed response).
+	 * Tear down the process and mark as failed so we do not
+	 * retry on every subsequent file.
+	 */
+	drv->diff_process_failed = 1;
+	drv->diff_subprocess = NULL;
+	subprocess_stop_command(&backend->subprocess);
+	free(backend);
+	free(hunks);
+	strbuf_release(&status);
+	sigchain_pop(SIGPIPE);
+	return DIFF_PROCESS_ERROR;
+}
+
+enum diff_process_result diff_process_fill_hunks(
+		struct diff_options *diffopt,
+		const char *path,
+		const mmfile_t *file_a,
+		const mmfile_t *file_b,
+		xpparam_t *xpp)
+{
+	struct userdiff_driver *drv;
+	struct xdl_hunk *ext_hunks = NULL;
+	size_t nr = 0;
+	enum diff_process_result res;
+
+	if (!diffopt || !path)
+		return DIFF_PROCESS_SKIP;
+	if (diffopt->flags.no_diff_process || diffopt->ignore_driver_algorithm)
+		return DIFF_PROCESS_SKIP;
+
+	drv = userdiff_find_by_path(diffopt->repo->index, path);
+	if (!drv || !drv->process)
+		return DIFF_PROCESS_SKIP;
+	if (drv->diff_process_failed)
+		return DIFF_PROCESS_SKIP;
+
+	res = get_hunks(drv, path,
+			file_a->ptr, file_a->size,
+			file_b->ptr, file_b->size,
+			&ext_hunks, &nr);
+	if (res == DIFF_PROCESS_OK) {
+		if (!nr) {
+			free(ext_hunks);
+			return DIFF_PROCESS_EQUIVALENT;
+		}
+		xpp->external_hunks = ext_hunks;
+		xpp->external_hunks_nr = nr;
+		return DIFF_PROCESS_OK;
+	}
+	if (res == DIFF_PROCESS_ERROR) {
+		warning(_("diff process '%s' failed for '%s',"
+			  " falling back to builtin diff"),
+			drv->process, path);
+		return DIFF_PROCESS_ERROR;
+	}
+	return DIFF_PROCESS_SKIP;
+}
diff --git a/diff-process.h b/diff-process.h
new file mode 100644
index 0000000000..d34b42f811
--- /dev/null
+++ b/diff-process.h
@@ -0,0 +1,39 @@
+#ifndef DIFF_PROCESS_H
+#define DIFF_PROCESS_H
+
+#include "xdiff/xdiff.h"
+
+struct diff_options;
+
+enum diff_process_result {
+	DIFF_PROCESS_ERROR = -1, /* tool failure: warned, fell back */
+	DIFF_PROCESS_OK = 0,     /* hunks populated in xpp */
+	DIFF_PROCESS_SKIP,       /* no process configured: use builtin */
+	DIFF_PROCESS_EQUIVALENT, /* tool says files are equivalent */
+};
+
+/*
+ * Consult the diff process configured for 'path' and populate
+ * xpp->external_hunks with the returned hunks.
+ *
+ * Handles driver lookup, flag checks (--no-ext-diff,
+ * --diff-algorithm), subprocess management, and error reporting.
+ *
+ * Returns DIFF_PROCESS_OK when hunks are populated in xpp.
+ * The caller owns xpp->external_hunks and must free() it.
+ *
+ * Returns DIFF_PROCESS_EQUIVALENT when the tool returns no hunks
+ * (files are considered identical); caller should skip diff/blame.
+ * Returns DIFF_PROCESS_SKIP when no process applies; caller
+ * should use the builtin diff algorithm.
+ * Returns DIFF_PROCESS_ERROR on tool failure (already warned);
+ * caller should fall back to the builtin diff algorithm.
+ */
+enum diff_process_result diff_process_fill_hunks(
+		struct diff_options *diffopt,
+		const char *path,
+		const mmfile_t *file_a,
+		const mmfile_t *file_b,
+		xpparam_t *xpp);
+
+#endif /* DIFF_PROCESS_H */
diff --git a/diff.c b/diff.c
index 5a584fa1d5..3d97a188b9 100644
--- a/diff.c
+++ b/diff.c
@@ -25,6 +25,7 @@
 #include "utf8.h"
 #include "odb.h"
 #include "userdiff.h"
+#include "diff-process.h"
 #include "submodule.h"
 #include "hashmap.h"
 #include "mem-pool.h"
@@ -4054,6 +4055,17 @@ static void builtin_diff(const char *name_a,
 		xpp.ignore_regex_nr = o->ignore_regex_nr;
 		xpp.anchors = o->anchors;
 		xpp.anchors_nr = o->anchors_nr;
+
+		if (diff_process_fill_hunks(o, name_a,
+					    &mf1, &mf2, &xpp)
+		    == DIFF_PROCESS_EQUIVALENT) {
+			if (textconv_one)
+				free(mf1.ptr);
+			if (textconv_two)
+				free(mf2.ptr);
+			goto free_ab_and_return;
+		}
+
 		xecfg.ctxlen = o->context;
 		xecfg.interhunkctxlen = o->interhunkcontext;
 		xecfg.flags = XDL_EMIT_FUNCNAMES;
@@ -4134,6 +4146,7 @@ static void builtin_diff(const char *name_a,
 		} else if (xdi_diff_outf(&mf1, &mf2, NULL, fn_out_consume,
 					 &ecbdata, &xpp, &xecfg))
 			die("unable to generate diff for %s", one->path);
+		free(xpp.external_hunks);
 		if (o->word_diff)
 			free_diff_words_data(&ecbdata);
 		if (textconv_one)
diff --git a/diff.h b/diff.h
index bb5cddaf34..7dc157968d 100644
--- a/diff.h
+++ b/diff.h
@@ -173,6 +173,9 @@ struct diff_flags {
 	 */
 	unsigned allow_external;
 
+	/** Disables diff.<driver>.process. */
+	unsigned no_diff_process;
+
 	/**
 	 * For communication between the calling program and the options parser;
 	 * tell the calling program to signal the presence of difference using
diff --git a/meson.build b/meson.build
index 3247697f74..aa532f5200 100644
--- a/meson.build
+++ b/meson.build
@@ -328,6 +328,7 @@ libgit_sources = [
   'diff-merges.c',
   'diff-lib.c',
   'diff-no-index.c',
+  'diff-process.c',
   'diff.c',
   'diffcore-break.c',
   'diffcore-delta.c',
diff --git a/t/helper/meson.build b/t/helper/meson.build
index 3235f10ab8..6abcda4afb 100644
--- a/t/helper/meson.build
+++ b/t/helper/meson.build
@@ -12,6 +12,7 @@ test_tool_sources = [
   'test-date.c',
   'test-delete-gpgsig.c',
   'test-delta.c',
+  'test-diff-process-backend.c',
   'test-dir-iterator.c',
   'test-drop-caches.c',
   'test-dump-cache-tree.c',
diff --git a/t/helper/test-diff-process-backend.c b/t/helper/test-diff-process-backend.c
new file mode 100644
index 0000000000..ad392694e6
--- /dev/null
+++ b/t/helper/test-diff-process-backend.c
@@ -0,0 +1,299 @@
+/*
+ * Test backend for the long-running diff process protocol
+ * (see diff-process.c and Documentation/gitattributes.adoc).
+ *
+ * Usage: test-tool diff-process-backend --mode=<mode> [--log=<path>]
+ *
+ * Implements the server side of the pkt-line handshake and a per-file
+ * response loop.  The --mode= switch selects the response shape
+ * (success, error, abort, crash, malformed hunks).
+ *
+ * Per-file request from Git:
+ *
+ *   packet:          git> command=hunks
+ *   packet:          git> pathname=<path>
+ *   packet:          git> 0000
+ *   packet:          git> OLD_CONTENT
+ *   packet:          git> 0000
+ *   packet:          git> NEW_CONTENT
+ *   packet:          git> 0000
+ *
+ * Response varies by --mode (default: whole-file):
+ *
+ *   whole-file   packet: git< hunk 1 <old_lines> 1 <new_lines>
+ *   fixed-hunk   packet: git< hunk 5 2 5 2
+ *   no-hunks     (no hunk packets)
+ *   bad-hunk     packet: git< hunk 999 1 999 1
+ *   bad-parse    packet: git< garbage not a hunk
+ *   bad-sync     packet: git< hunk 1 2 1 1
+ *   overlap      packet: git< hunk 1 5 1 5
+ *                packet: git< hunk 3 2 3 2
+ *   no-cap       (omits capability=hunks during handshake)
+ *   error        (status=error instead of status=success)
+ *   abort        (status=abort instead of status=success)
+ *   crash        exit(1) before sending any response
+ *
+ * All non-error/abort modes end with:
+ *
+ *   packet:          git< 0000
+ *   packet:          git< status=success
+ *   packet:          git< 0000
+ *
+ * Each request is logged to --log as:
+ *
+ *   command=<cmd> pathname=<path> old=<first line> new=<first line>
+ */
+
+#include "test-tool.h"
+#include "pkt-line.h"
+#include "parse-options.h"
+#include "strbuf.h"
+
+static FILE *logfile;
+
+enum mode {
+	MODE_WHOLE_FILE,
+	MODE_FIXED_HUNK,
+	MODE_NO_HUNKS,
+	MODE_BAD_HUNK,
+	MODE_BAD_PARSE,
+	MODE_BAD_SYNC,
+	MODE_OVERLAP,
+	MODE_NO_CAP,
+	MODE_ERROR,
+	MODE_ABORT,
+	MODE_CRASH,
+};
+
+static enum mode parse_mode(const char *s)
+{
+	if (!strcmp(s, "whole-file"))
+		return MODE_WHOLE_FILE;
+	if (!strcmp(s, "fixed-hunk"))
+		return MODE_FIXED_HUNK;
+	if (!strcmp(s, "no-hunks"))
+		return MODE_NO_HUNKS;
+	if (!strcmp(s, "bad-hunk"))
+		return MODE_BAD_HUNK;
+	if (!strcmp(s, "bad-parse"))
+		return MODE_BAD_PARSE;
+	if (!strcmp(s, "bad-sync"))
+		return MODE_BAD_SYNC;
+	if (!strcmp(s, "overlap"))
+		return MODE_OVERLAP;
+	if (!strcmp(s, "no-cap"))
+		return MODE_NO_CAP;
+	if (!strcmp(s, "error"))
+		return MODE_ERROR;
+	if (!strcmp(s, "abort"))
+		return MODE_ABORT;
+	if (!strcmp(s, "crash"))
+		return MODE_CRASH;
+	die("unknown --mode=%s", s);
+}
+
+/*
+ * Read "key=value" packets up to a flush, capturing "command" and
+ * "pathname".  Returns 1 if a request was read, 0 on EOF.
+ *
+ * The first packet uses the gentle variant so that a clean shutdown
+ * by Git (EOF) does not produce a spurious "the remote end hung up
+ * unexpectedly" on stderr.  Subsequent packets use the non-gentle
+ * variant: once inside a request, truncation is a protocol violation
+ * and dying loudly is the correct response.
+ */
+static int read_request_header(char **command, char **pathname)
+{
+	int first = 1;
+	char *line;
+
+	*command = *pathname = NULL;
+	for (;;) {
+		const char *value;
+
+		if (first) {
+			if (packet_read_line_gently(0, NULL, &line) < 0)
+				return 0;
+			first = 0;
+		} else {
+			line = packet_read_line(0, NULL);
+		}
+		if (!line)
+			break;
+		if (skip_prefix(line, "command=", &value))
+			*command = xstrdup(value);
+		else if (skip_prefix(line, "pathname=", &value))
+			*pathname = xstrdup(value);
+	}
+	return 1;
+}
+
+static size_t count_lines(const struct strbuf *buf)
+{
+	size_t lines = 0;
+
+	for (size_t i = 0; i < buf->len; i++)
+		if (buf->buf[i] == '\n')
+			lines++;
+
+	return lines + (buf->len > 0 && buf->buf[buf->len - 1] != '\n');
+}
+
+static void send_status(const char *status)
+{
+	packet_flush(1);
+	packet_write_fmt(1, "%s\n", status);
+	packet_flush(1);
+}
+
+static void respond(enum mode mode,
+		    const struct strbuf *old_buf,
+		    const struct strbuf *new_buf)
+{
+	switch (mode) {
+	case MODE_ERROR:
+		send_status("status=error");
+		return;
+	case MODE_ABORT:
+		send_status("status=abort");
+		return;
+	case MODE_CRASH:
+		exit(1);
+	case MODE_FIXED_HUNK:
+		packet_write_fmt(1, "hunk 5 2 5 2\n");
+		break;
+	case MODE_BAD_HUNK:
+		packet_write_fmt(1, "hunk 999 1 999 1\n");
+		break;
+	case MODE_BAD_PARSE:
+		packet_write_fmt(1, "garbage not a hunk\n");
+		break;
+	case MODE_BAD_SYNC:
+		packet_write_fmt(1, "hunk 1 2 1 1\n");
+		break;
+	case MODE_OVERLAP:
+		packet_write_fmt(1, "hunk 1 5 1 5\n");
+		packet_write_fmt(1, "hunk 3 2 3 2\n");
+		break;
+	case MODE_NO_HUNKS:
+		break;
+	case MODE_NO_CAP:
+	case MODE_WHOLE_FILE: {
+		size_t old_lines = count_lines(old_buf);
+		size_t new_lines = count_lines(new_buf);
+		/*
+		 * Match git diff output: start=0 when count=0
+		 * (empty file side), 1 otherwise.
+		 */
+		packet_write_fmt(1, "hunk %"PRIuMAX" %"PRIuMAX
+				 " %"PRIuMAX" %"PRIuMAX"\n",
+				 (uintmax_t)(old_lines ? 1 : 0),
+				 (uintmax_t)old_lines,
+				 (uintmax_t)(new_lines ? 1 : 0),
+				 (uintmax_t)new_lines);
+		break;
+	}
+	}
+	send_status("status=success");
+}
+
+static void command_loop(enum mode mode)
+{
+	for (;;) {
+		char *command = NULL, *pathname = NULL;
+		struct strbuf obuf = STRBUF_INIT;
+		struct strbuf nbuf = STRBUF_INIT;
+
+		if (!read_request_header(&command, &pathname))
+			break; /* EOF: Git closed its end */
+
+		read_packetized_to_strbuf(0, &obuf, 0);
+		read_packetized_to_strbuf(0, &nbuf, 0);
+
+		if (logfile) {
+			fprintf(logfile,
+				"command=%s pathname=%s old=%.*s new=%.*s\n",
+				command ? command : "(none)",
+				pathname ? pathname : "(none)",
+				(int)(strchrnul(obuf.buf, '\n') - obuf.buf),
+				obuf.buf,
+				(int)(strchrnul(nbuf.buf, '\n') - nbuf.buf),
+				nbuf.buf);
+			fflush(logfile);
+		}
+
+		respond(mode, &obuf, &nbuf);
+
+		free(command);
+		free(pathname);
+		strbuf_release(&obuf);
+		strbuf_release(&nbuf);
+	}
+}
+
+static void handshake(enum mode mode)
+{
+	char *line;
+
+	line = packet_read_line(0, NULL);
+	if (!line || strcmp(line, "git-diff-client"))
+		die("bad welcome: '%s'", line ? line : "(eof)");
+	line = packet_read_line(0, NULL);
+	if (!line || strcmp(line, "version=1"))
+		die("bad version: '%s'", line ? line : "(eof)");
+	if (packet_read_line(0, NULL))
+		die("expected flush after version");
+
+	packet_write_fmt(1, "git-diff-server\n");
+	packet_write_fmt(1, "version=1\n");
+	packet_flush(1);
+
+	/* Drain capabilities advertised by Git */
+	while ((line = packet_read_line(0, NULL)))
+		; /* drain */
+
+	/* Respond with our capabilities (or none for no-cap mode) */
+	if (mode != MODE_NO_CAP)
+		packet_write_fmt(1, "capability=hunks\n");
+	packet_flush(1);
+}
+
+static const char *const usage_str[] = {
+	"test-tool diff-process-backend --mode=<mode> [--log=<path>]",
+	NULL
+};
+
+int cmd__diff_process_backend(int argc, const char **argv)
+{
+	const char *mode_str = NULL, *log_path = NULL;
+	enum mode mode = MODE_WHOLE_FILE;
+	struct option options[] = {
+		OPT_STRING(0, "mode", &mode_str, "mode",
+			   "response shape: whole-file (default), fixed-hunk,"
+			   " no-hunks, bad-hunk, bad-sync, overlap, error,"
+			   " abort, crash"),
+		OPT_STRING(0, "log", &log_path, "path",
+			   "append per-request summary to this file"),
+		OPT_END()
+	};
+
+	argc = parse_options(argc, argv, NULL, options, usage_str, 0);
+	if (argc)
+		usage_with_options(usage_str, options);
+
+	if (mode_str)
+		mode = parse_mode(mode_str);
+
+	if (log_path) {
+		logfile = fopen(log_path, "a");
+		if (!logfile)
+			die_errno("failed to open log '%s'", log_path);
+	}
+
+	handshake(mode);
+	command_loop(mode);
+
+	if (logfile && fclose(logfile))
+		die_errno("error closing log");
+	return 0;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index b71a22b43b..3c3f95269c 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -22,6 +22,7 @@ static struct test_cmd cmds[] = {
 	{ "date", cmd__date },
 	{ "delete-gpgsig", cmd__delete_gpgsig },
 	{ "delta", cmd__delta },
+	{ "diff-process-backend", cmd__diff_process_backend },
 	{ "dir-iterator", cmd__dir_iterator },
 	{ "drop-caches", cmd__drop_caches },
 	{ "dump-cache-tree", cmd__dump_cache_tree },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index f2885b33d5..a5bb755516 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -15,6 +15,7 @@ int cmd__csprng(int argc, const char **argv);
 int cmd__date(int argc, const char **argv);
 int cmd__delta(int argc, const char **argv);
 int cmd__delete_gpgsig(int argc, const char **argv);
+int cmd__diff_process_backend(int argc, const char **argv);
 int cmd__dir_iterator(int argc, const char **argv);
 int cmd__drop_caches(int argc, const char **argv);
 int cmd__dump_cache_tree(int argc, const char **argv);
diff --git a/t/meson.build b/t/meson.build
index c5832fee05..027855ced7 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -512,6 +512,7 @@ integration_tests = [
   't4072-diff-max-depth.sh',
   't4073-diff-stat-name-width.sh',
   't4074-diff-shifted-matched-group.sh',
+  't4080-diff-process.sh',
   't4100-apply-stat.sh',
   't4101-apply-nonl.sh',
   't4102-apply-rename.sh',
diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
new file mode 100755
index 0000000000..9bb579b564
--- /dev/null
+++ b/t/t4080-diff-process.sh
@@ -0,0 +1,432 @@
+#!/bin/sh
+
+test_description='diff process via long-running process'
+
+. ./test-lib.sh
+
+# See t/helper/test-diff-process-backend.c for the backend implementation
+# and available --mode= options.
+
+BACKEND="test-tool diff-process-backend"
+
+test_expect_success 'setup' '
+	echo "*.c diff=cdiff" >.gitattributes &&
+	git add .gitattributes &&
+
+	# boundary.c: 10 lines, changes at 5-6 and 9-10.
+	# Used by: hunk boundaries, error fallback, crash, bad hunks, overlap.
+	cat >boundary.c <<-\EOF &&
+	line1
+	line2
+	line3
+	line4
+	OLD5
+	OLD6
+	line7
+	line8
+	OLD9
+	OLD10
+	EOF
+	git add boundary.c &&
+
+	# worddiff.c: single-line function, value changes 1 -> 999.
+	# Used by: word-diff, --diff-algorithm, --no-ext-diff, --stat.
+	cat >worddiff.c <<-\EOF &&
+	int value(void) { return 1; }
+	EOF
+	git add worddiff.c &&
+
+	# newfile.c: single-line function, value changes 42 -> 99.
+	# Used by: modified file, --exit-code, multiple drivers.
+	cat >newfile.c <<-\EOF &&
+	int new_func(void) { return 42; }
+	EOF
+	git add newfile.c &&
+
+	# logtest.c: single-line function for log/format-patch tests.
+	# Needs two commits so log -1 has a diff.
+	cat >logtest.c <<-\EOF &&
+	int logfunc(void) { return 1; }
+	EOF
+	git add logtest.c &&
+
+	# two.c/one.c: two-file pair for error/abort/startup-failure tests.
+	cat >one.c <<-\EOF &&
+	int first(void) { return 1; }
+	EOF
+	cat >two.c <<-\EOF &&
+	int second(void) { return 2; }
+	EOF
+	git add one.c two.c &&
+
+	git commit -m "initial" &&
+
+	# Second commit for logtest.c (so log -1 has something to show).
+	cat >logtest.c <<-\EOF &&
+	int logfunc(void) { return 2; }
+	EOF
+	git add logtest.c &&
+	git commit -m "change logtest.c" &&
+
+	# Working tree modifications (not committed).
+	cat >boundary.c <<-\EOF &&
+	line1
+	line2
+	line3
+	line4
+	NEW5
+	NEW6
+	line7
+	line8
+	NEW9
+	NEW10
+	EOF
+
+	cat >worddiff.c <<-\EOF &&
+	int value(void) { return 999; }
+	EOF
+
+	cat >newfile.c <<-\EOF &&
+	int new_func(void) { return 99; }
+	EOF
+
+	cat >one.c <<-\EOF &&
+	int first(void) { return 10; }
+	EOF
+
+	cat >two.c <<-\EOF
+	int second(void) { return 20; }
+	EOF
+'
+
+#
+# Core behavior: the tool controls which lines are marked as changed.
+#
+
+test_expect_success 'diff process hunk boundaries affect output' '
+	# The file has changes at lines 5-6 and 9-10, but fixed-hunk
+	# only reports lines 5-6 as changed.  Lines 9-10 should not
+	# appear as changed in the output.
+	git -c diff.cdiff.process="$BACKEND --mode=fixed-hunk" \
+		diff boundary.c >actual &&
+	test_grep "^-OLD5" actual &&
+	test_grep "^-OLD6" actual &&
+	test_grep "^+NEW5" actual &&
+	test_grep "^+NEW6" actual &&
+	test_grep ! "^-OLD9" actual &&
+	test_grep ! "^-OLD10" actual &&
+	test_grep ! "^+NEW9" actual &&
+	test_grep ! "^+NEW10" actual
+'
+
+test_expect_success 'diff process works with modified file' '
+	test_when_finished "rm -f backend.log" &&
+	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
+		diff -- newfile.c >actual 2>stderr &&
+	test_grep "return 99" actual &&
+	test_grep "pathname=newfile.c" backend.log &&
+	test_must_be_empty stderr
+'
+
+test_expect_success 'diff process works with added file (empty old side)' '
+	cat >added.c <<-\EOF &&
+	int added(void) { return 1; }
+	EOF
+	git add added.c &&
+
+	test_when_finished "rm -f backend.log" &&
+	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
+		diff --cached -- added.c >actual 2>stderr &&
+	test_grep "added" actual &&
+	test_grep "pathname=added.c" backend.log &&
+	test_must_be_empty stderr
+'
+
+test_expect_success 'diff process works with deleted file (empty new side)' '
+	git add added.c &&
+	git commit -m "commit added.c" &&
+	git rm added.c &&
+
+	test_when_finished "rm -f backend.log" &&
+	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
+		diff --cached -- added.c >actual 2>stderr &&
+	test_grep "deleted file" actual &&
+	test_grep "pathname=added.c" backend.log &&
+	test_must_be_empty stderr
+'
+
+test_expect_success 'diff process skipped for binary files' '
+	printf "\\0binary" >binary.c &&
+	git add binary.c &&
+	git commit -m "add binary" &&
+	printf "\\0changed" >binary.c &&
+
+	test_when_finished "rm -f backend.log" &&
+	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
+		diff -- binary.c >actual &&
+	test_grep "Binary files" actual &&
+	test_path_is_missing backend.log
+'
+
+test_expect_success 'diff process not consulted for unmatched driver' '
+	echo "not tracked by cdiff" >unmatched.txt &&
+	git add unmatched.txt &&
+	git commit -m "add unmatched.txt" &&
+
+	echo "modified" >unmatched.txt &&
+
+	test_when_finished "rm -f backend.log" &&
+	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
+		diff -- unmatched.txt >actual &&
+	test_grep "modified" actual &&
+	test_path_is_missing backend.log
+'
+
+test_expect_success 'multiple drivers use separate processes' '
+	echo "*.h diff=hdiff" >>.gitattributes &&
+	git add .gitattributes &&
+
+	cat >multi.h <<-\EOF &&
+	int header(void) { return 1; }
+	EOF
+	git add multi.h &&
+	git commit -m "add multi.h" &&
+
+	cat >multi.h <<-\EOF &&
+	int header(void) { return 2; }
+	EOF
+
+	test_when_finished "rm -f backend-c.log backend-h.log" &&
+	git -c diff.cdiff.process="$BACKEND --log=backend-c.log" \
+	    -c diff.hdiff.process="$BACKEND --log=backend-h.log" \
+		diff -- newfile.c multi.h >actual 2>stderr &&
+	test_grep "pathname=newfile.c" backend-c.log &&
+	test_grep "pathname=multi.h" backend-h.log &&
+	test_must_be_empty stderr
+'
+
+test_expect_success 'diff process works alongside textconv' '
+	write_script uppercase-filter <<-\EOF &&
+	tr "a-z" "A-Z" <"$1"
+	EOF
+
+	cat >textconv.c <<-\EOF &&
+	hello world
+	EOF
+	git add textconv.c &&
+	git commit -m "add textconv.c" &&
+
+	cat >textconv.c <<-\EOF &&
+	goodbye world
+	EOF
+
+	test_when_finished "rm -f backend.log" &&
+	git -c diff.cdiff.textconv="./uppercase-filter" \
+	    -c diff.cdiff.process="$BACKEND --log=backend.log" \
+		diff -- textconv.c >actual 2>stderr &&
+	# The diff process receives textconv-transformed (uppercase) content.
+	test_grep "pathname=textconv.c" backend.log &&
+	test_grep "old=HELLO WORLD" backend.log &&
+	test_grep "new=GOODBYE WORLD" backend.log &&
+	test_must_be_empty stderr
+'
+
+#
+# Downstream features: word diff, log, equivalent files, exit code.
+#
+
+test_expect_success 'diff process with --word-diff' '
+	test_when_finished "rm -f backend.log" &&
+	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
+		diff --word-diff worddiff.c >actual 2>stderr &&
+	test_grep "\[-1;-\]" actual &&
+	test_grep "{+999;+}" actual &&
+	test_grep "pathname=worddiff.c" backend.log &&
+	test_must_be_empty stderr
+'
+
+test_expect_success 'diff process works with git log -p' '
+	# With no-hunks mode, the tool says the files are equivalent,
+	# so log -p should show the commit but no diff content.
+	test_when_finished "rm -f backend.log" &&
+	git -c diff.cdiff.process="$BACKEND --mode=no-hunks --log=backend.log" \
+		log -1 -p -- logtest.c >actual 2>stderr &&
+	test_grep "change logtest.c" actual &&
+	test_grep ! "return 2" actual &&
+	test_grep "command=hunks pathname=logtest.c" backend.log &&
+	test_must_be_empty stderr
+'
+
+test_expect_success 'diff process no hunks suppresses diff output' '
+	cat >nohunks.c <<-\EOF &&
+	int zero(void) { return 0; }
+	EOF
+	git add nohunks.c &&
+	git commit -m "add nohunks.c" &&
+
+	cat >nohunks.c <<-\EOF &&
+	int zero(void) { return 999; }
+	EOF
+
+	git -c diff.cdiff.process="$BACKEND --mode=no-hunks" \
+		diff nohunks.c >actual &&
+	test_must_be_empty actual
+'
+
+test_expect_success 'diff process no hunks with --exit-code returns success' '
+	git -c diff.cdiff.process="$BACKEND --mode=no-hunks" \
+		diff --exit-code nohunks.c
+'
+
+test_expect_success 'diff process with --exit-code and hunks returns failure' '
+	test_expect_code 1 git -c diff.cdiff.process="$BACKEND" \
+		diff --exit-code newfile.c
+'
+
+#
+# Bypass mechanisms: flags and commands that skip the diff process.
+#
+
+test_expect_success 'diff process bypassed by --diff-algorithm' '
+	test_when_finished "rm -f backend.log" &&
+	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
+		diff --diff-algorithm=patience worddiff.c >actual &&
+	test_grep "return 999" actual &&
+	test_path_is_missing backend.log
+'
+
+test_expect_success 'diff process not used by --stat' '
+	test_when_finished "rm -f backend.log" &&
+	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
+		diff --stat worddiff.c >actual &&
+	test_grep "worddiff.c" actual &&
+	test_path_is_missing backend.log
+'
+
+#
+# Error handling and fallback.
+#
+
+test_expect_success 'diff process fallback on tool error status' '
+	test_when_finished "rm -f backend.log" &&
+	git -c diff.cdiff.process="$BACKEND --mode=error --log=backend.log" \
+		diff boundary.c >actual 2>stderr &&
+	# Fallback produces the full builtin diff (both change regions).
+	test_grep "^-OLD5" actual &&
+	test_grep "^+NEW5" actual &&
+	test_grep "^-OLD9" actual &&
+	test_grep "^+NEW9" actual &&
+	# Tool was contacted (it replied with error, not crash).
+	test_grep "command=hunks pathname=boundary.c" backend.log &&
+	test_grep "diff process.*failed" stderr
+'
+
+test_expect_success 'diff process error keeps tool available for next file' '
+	test_when_finished "rm -f backend.log" &&
+	git -c diff.cdiff.process="$BACKEND --mode=error --log=backend.log" \
+		diff -- one.c two.c >actual 2>stderr &&
+	# Unlike abort, error keeps the tool available: both files
+	# are sent to the tool (and both fall back).
+	test_grep "pathname=one.c" backend.log &&
+	test_grep "pathname=two.c" backend.log &&
+	test_grep "return 10" actual &&
+	test_grep "return 20" actual &&
+	test_grep "diff process.*failed" stderr
+'
+
+test_expect_success 'diff process abort disables for session' '
+	test_when_finished "rm -f backend.log" &&
+	git -c diff.cdiff.process="$BACKEND --mode=abort --log=backend.log" \
+		diff -- one.c two.c >actual 2>stderr &&
+	# Both files should still produce diff output via fallback.
+	test_grep "return 10" actual &&
+	test_grep "return 20" actual &&
+	# The tool aborts on the first file and git clears its
+	# capability.  The second file never contacts the tool.
+	test_grep "pathname=one.c" backend.log &&
+	test_grep ! "pathname=two.c" backend.log &&
+	test_must_be_empty stderr
+'
+
+test_expect_success 'diff process fallback on tool crash' '
+	git -c diff.cdiff.process="$BACKEND --mode=crash" \
+		diff boundary.c >actual 2>stderr &&
+	test_grep "^-OLD5" actual &&
+	test_grep "^+NEW5" actual &&
+	test_grep "^-OLD9" actual &&
+	test_grep "^+NEW9" actual &&
+	# Crash is a communication failure, so a warning is emitted.
+	test_grep "diff process.*failed" stderr
+'
+
+test_expect_success 'diff process startup failure only warns once' '
+	git -c diff.cdiff.process="/nonexistent/tool" \
+		diff -- one.c two.c >actual 2>stderr &&
+	# Both files produce diff output via fallback.
+	test_grep "return 10" actual &&
+	test_grep "return 20" actual &&
+	# Sentinel prevents repeated warnings: only one, not one per file.
+	test_grep "diff process.*failed" stderr >warnings &&
+	test_line_count = 1 warnings
+'
+
+
+test_expect_success 'diff process fallback on bad hunks' '
+	git -c diff.cdiff.process="$BACKEND --mode=bad-hunk" \
+		diff boundary.c >actual 2>stderr &&
+	test_grep "^-OLD5" actual &&
+	test_grep "^+NEW5" actual &&
+	test_grep "^-OLD9" actual &&
+	test_grep "^+NEW9" actual &&
+	test_grep "exceeds.*lines" stderr
+'
+
+test_expect_success 'diff process fallback on mismatched unchanged totals' '
+	cat >synctest.c <<-\EOF &&
+	line1
+	line2
+	line3
+	EOF
+	git add synctest.c &&
+	git commit -m "add synctest.c" &&
+
+	cat >synctest.c <<-\EOF &&
+	line1
+	changed
+	line3
+	EOF
+
+	# bad-sync reports hunk 1 2 1 1: marks 2 old lines and 1 new
+	# line as changed, leaving 1 unchanged old vs 2 unchanged new.
+	# The synchronization invariant fails and git falls back.
+	git -c diff.cdiff.process="$BACKEND --mode=bad-sync" \
+		diff synctest.c >actual 2>stderr &&
+	test_grep "changed" actual &&
+	test_grep "unchanged line count mismatch" stderr
+'
+
+test_expect_success 'diff process fallback on overlapping hunks' '
+	# boundary.c has 10 lines, so both hunks are in bounds
+	# but they overlap at lines 3-5, triggering the ordering check.
+	git -c diff.cdiff.process="$BACKEND --mode=overlap" \
+		diff boundary.c >actual 2>stderr &&
+	test_grep "NEW5" actual &&
+	test_grep "overlaps with previous" stderr
+'
+
+test_expect_success 'diff process fallback on malformed hunk line' '
+	git -c diff.cdiff.process="$BACKEND --mode=bad-parse" \
+		diff boundary.c >actual 2>stderr &&
+	test_grep "^-OLD5" actual &&
+	test_grep "^+NEW5" actual
+'
+
+test_expect_success 'diff process skipped when tool omits capability' '
+	git -c diff.cdiff.process="$BACKEND --mode=no-cap" \
+		diff boundary.c >actual 2>stderr &&
+	test_grep "^-OLD5" actual &&
+	test_grep "^+NEW5" actual &&
+	test_must_be_empty stderr
+'
+
+test_done
diff --git a/userdiff.h b/userdiff.h
index 51c26e0d41..a98eabe377 100644
--- a/userdiff.h
+++ b/userdiff.h
@@ -3,6 +3,7 @@
 
 #include "notes-cache.h"
 
+struct diff_subprocess;
 struct index_state;
 struct repository;
 
@@ -33,6 +34,8 @@ struct userdiff_driver {
 	int textconv_want_cache;
 	const char *process;
 	char *process_owned;
+	struct diff_subprocess *diff_subprocess;
+	unsigned diff_process_failed : 1;
 };
 enum userdiff_driver_type {
 	USERDIFF_DRIVER_TYPE_BUILTIN = 1<<0,
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v4 5/6] diff: bypass diff process with --no-ext-diff and in format-patch
From: Michael Montalbo via GitGitGadget @ 2026-06-14 18:59 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Michael Montalbo, Michael Montalbo
In-Reply-To: <pull.2120.v4.git.1781463564.gitgitgadget@gmail.com>

From: Michael Montalbo <mmontalbo@gmail.com>

Make --no-ext-diff disable diff.<driver>.process in addition to
diff.<driver>.command.  Although the two mechanisms work differently
(command replaces Git's output, process feeds hunks back into the
pipeline), both invoke external tools and --no-ext-diff means
"no external tools."

Replace the OPT_BOOL for --ext-diff with an OPT_CALLBACK that
sets both allow_external and no_diff_process, so a single option
controls both.  Passing --ext-diff explicitly clears
no_diff_process, so a later --ext-diff overrides an earlier
--no-ext-diff.

Disable the diff process unconditionally in format-patch so that
generated patches are always based on the builtin diff algorithm
and can be applied reliably by recipients who do not have the
external tool.

Document that --diff-algorithm also bypasses the diff process,
since it forces the builtin algorithm.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
---
 Documentation/diff-algorithm-option.adoc |  3 +++
 Documentation/diff-options.adoc          |  4 +++-
 builtin/log.c                            |  7 +++++++
 diff.c                                   | 16 ++++++++++++++--
 diff.h                                   |  4 +++-
 t/t4080-diff-process.sh                  | 16 ++++++++++++++++
 6 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/Documentation/diff-algorithm-option.adoc b/Documentation/diff-algorithm-option.adoc
index 8e3a0b63d7..4d7e2ec35f 100644
--- a/Documentation/diff-algorithm-option.adoc
+++ b/Documentation/diff-algorithm-option.adoc
@@ -18,3 +18,6 @@
 For instance, if you configured the `diff.algorithm` variable to a
 non-default value and want to use the default one, then you
 have to use `--diff-algorithm=default` option.
++
+If you explicitly choose a diff algorithm, it also bypasses
+`diff.<driver>.process` (see linkgit:gitattributes[5]).
diff --git a/Documentation/diff-options.adoc b/Documentation/diff-options.adoc
index c8242e2462..a884445211 100644
--- a/Documentation/diff-options.adoc
+++ b/Documentation/diff-options.adoc
@@ -833,7 +833,9 @@ endif::git-format-patch[]
 	to use this option with linkgit:git-log[1] and friends.
 
 `--no-ext-diff`::
-	Disallow external diff drivers.
+	Disallow external diff helpers, including
+	`diff.<driver>.command` and `diff.<driver>.process`
+	(see linkgit:gitattributes[5]).
 
 `--textconv`::
 `--no-textconv`::
diff --git a/builtin/log.c b/builtin/log.c
index e464b30af4..363052f468 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -2217,6 +2217,13 @@ int cmd_format_patch(int argc,
 	if (argc > 1)
 		die(_("unrecognized argument: %s"), argv[1]);
 
+	/*
+	 * Disable diff.<driver>.process so that patches generated by
+	 * format-patch are always based on the builtin diff algorithm
+	 * and can be applied reliably.
+	 */
+	rev.diffopt.flags.no_diff_process = 1;
+
 	if (rev.diffopt.output_format & DIFF_FORMAT_NAME)
 		die(_("--name-only does not make sense"));
 	if (rev.diffopt.output_format & DIFF_FORMAT_NAME_STATUS)
diff --git a/diff.c b/diff.c
index 3d97a188b9..4d9cb9b26b 100644
--- a/diff.c
+++ b/diff.c
@@ -5936,6 +5936,17 @@ static int diff_opt_submodule(const struct option *opt,
 	return 0;
 }
 
+static int diff_opt_ext_diff(const struct option *opt,
+			     const char *arg, int unset)
+{
+	struct diff_options *options = opt->value;
+
+	BUG_ON_OPT_ARG(arg);
+	options->flags.allow_external = !unset;
+	options->flags.no_diff_process = unset;
+	return 0;
+}
+
 static int diff_opt_textconv(const struct option *opt,
 			     const char *arg, int unset)
 {
@@ -6266,8 +6277,9 @@ struct option *add_diff_options(const struct option *opts,
 			 N_("exit with 1 if there were differences, 0 otherwise")),
 		OPT_BOOL(0, "quiet", &options->flags.quick,
 			 N_("disable all output of the program")),
-		OPT_BOOL(0, "ext-diff", &options->flags.allow_external,
-			 N_("allow an external diff helper to be executed")),
+		OPT_CALLBACK_F(0, "ext-diff", options, NULL,
+			       N_("allow an external diff helper to be executed"),
+			       PARSE_OPT_NOARG, diff_opt_ext_diff),
 		OPT_CALLBACK_F(0, "textconv", options, NULL,
 			       N_("run external text conversion filters when comparing binary files"),
 			       PARSE_OPT_NOARG, diff_opt_textconv),
diff --git a/diff.h b/diff.h
index 7dc157968d..bc7da6986a 100644
--- a/diff.h
+++ b/diff.h
@@ -173,7 +173,9 @@ struct diff_flags {
 	 */
 	unsigned allow_external;
 
-	/** Disables diff.<driver>.process. */
+	/**
+	 * Disables diff.<driver>.process.  Set by --no-ext-diff.
+	 */
 	unsigned no_diff_process;
 
 	/**
diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
index 9bb579b564..df4d08e31f 100755
--- a/t/t4080-diff-process.sh
+++ b/t/t4080-diff-process.sh
@@ -295,6 +295,22 @@ test_expect_success 'diff process bypassed by --diff-algorithm' '
 	test_path_is_missing backend.log
 '
 
+test_expect_success 'diff process bypassed by --no-ext-diff' '
+	test_when_finished "rm -f backend.log" &&
+	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
+		diff --no-ext-diff worddiff.c >actual &&
+	test_grep "return 999" actual &&
+	test_path_is_missing backend.log
+'
+
+test_expect_success 'diff process not used by format-patch' '
+	test_when_finished "rm -f backend.log" &&
+	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
+		format-patch -1 --stdout -- logtest.c >actual &&
+	test_grep "return 2" actual &&
+	test_path_is_missing backend.log
+'
+
 test_expect_success 'diff process not used by --stat' '
 	test_when_finished "rm -f backend.log" &&
 	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v4 6/6] blame: consult diff process for no-hunk detection
From: Michael Montalbo via GitGitGadget @ 2026-06-14 18:59 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Michael Montalbo, Michael Montalbo
In-Reply-To: <pull.2120.v4.git.1781463564.gitgitgadget@gmail.com>

From: Michael Montalbo <mmontalbo@gmail.com>

When a diff process is configured via diff.<driver>.process,
consult it during blame's per-commit diffing.  If the process
returns no hunks for a commit's changes to a file, treat the
commit as having no changes, causing blame to attribute lines
to earlier commits.

The consultation happens at the pass_blame_to_parent() callsite
using diff_process_fill_hunks(), matching how builtin_diff() in
diff.c uses the same function.  A new diff_hunks_xpp() variant
accepts a pre-populated xpparam_t so callers can pass external
hunks, while the existing diff_hunks() retains its original
signature and behavior.  The copy-detection callsite is
unaffected since it does not use the diff process.

The subprocess is long-running (one startup cost amortized
across the blame traversal), but each commit in the file's
history incurs a round-trip to the tool.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
---
 blame.c                 |  40 +++++++++++----
 t/t4080-diff-process.sh | 105 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 136 insertions(+), 9 deletions(-)

diff --git a/blame.c b/blame.c
index 977cbb7097..354e6c15f4 100644
--- a/blame.c
+++ b/blame.c
@@ -19,6 +19,8 @@
 #include "tag.h"
 #include "trace2.h"
 #include "blame.h"
+#include "diff-process.h"
+#include "xdiff-interface.h"
 #include "alloc.h"
 #include "commit-slab.h"
 #include "bloom.h"
@@ -314,17 +316,25 @@ static struct commit *fake_working_tree_commit(struct repository *r,
 
 
 
-static int diff_hunks(mmfile_t *file_a, mmfile_t *file_b,
-		      xdl_emit_hunk_consume_func_t hunk_func, void *cb_data, int xdl_opts)
+static int diff_hunks_xpp(mmfile_t *file_a, mmfile_t *file_b,
+			  xdl_emit_hunk_consume_func_t hunk_func,
+			  void *cb_data, xpparam_t *xpp)
 {
-	xpparam_t xpp = {0};
 	xdemitconf_t xecfg = {0};
 	xdemitcb_t ecb = {NULL};
 
-	xpp.flags = xdl_opts;
 	xecfg.hunk_func = hunk_func;
 	ecb.priv = cb_data;
-	return xdi_diff(file_a, file_b, &xpp, &xecfg, &ecb);
+	return xdi_diff(file_a, file_b, xpp, &xecfg, &ecb);
+}
+
+static int diff_hunks(mmfile_t *file_a, mmfile_t *file_b,
+		      xdl_emit_hunk_consume_func_t hunk_func, void *cb_data, int xdl_opts)
+{
+	xpparam_t xpp = {0};
+
+	xpp.flags = xdl_opts;
+	return diff_hunks_xpp(file_a, file_b, hunk_func, cb_data, &xpp);
 }
 
 static const char *get_next_line(const char *start, const char *end)
@@ -1943,6 +1953,7 @@ static void pass_blame_to_parent(struct blame_scoreboard *sb,
 				 struct blame_origin *parent, int ignore_diffs)
 {
 	mmfile_t file_p, file_o;
+	xpparam_t xpp = {0};
 	struct blame_chunk_cb_data d;
 	struct blame_entry *newdest = NULL;
 
@@ -1961,10 +1972,21 @@ static void pass_blame_to_parent(struct blame_scoreboard *sb,
 			 &sb->num_read_blob, ignore_diffs);
 	sb->num_get_patch++;
 
-	if (diff_hunks(&file_p, &file_o, blame_chunk_cb, &d, sb->xdl_opts))
-		die("unable to generate diff (%s -> %s)",
-		    oid_to_hex(&parent->commit->object.oid),
-		    oid_to_hex(&target->commit->object.oid));
+	xpp.flags = sb->xdl_opts;
+	/*
+	 * If the diff process considers the files equivalent,
+	 * skip the diff so blame looks past this commit.
+	 */
+	if (diff_process_fill_hunks(&sb->revs->diffopt, target->path,
+				    &file_p, &file_o, &xpp)
+	    != DIFF_PROCESS_EQUIVALENT) {
+		if (diff_hunks_xpp(&file_p, &file_o, blame_chunk_cb,
+				   &d, &xpp))
+			die("unable to generate diff (%s -> %s)",
+			    oid_to_hex(&parent->commit->object.oid),
+			    oid_to_hex(&target->commit->object.oid));
+	}
+	free(xpp.external_hunks);
 	/* The rest are the same as the parent */
 	blame_chunk(&d.dstq, &d.srcq, INT_MAX, d.offset, INT_MAX, 0,
 		    parent, target, 0);
diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
index df4d08e31f..9fc3c01eec 100755
--- a/t/t4080-diff-process.sh
+++ b/t/t4080-diff-process.sh
@@ -445,4 +445,109 @@ test_expect_success 'diff process skipped when tool omits capability' '
 	test_must_be_empty stderr
 '
 
+#
+# Blame integration.
+#
+
+test_expect_success 'blame uses tool-provided hunks' '
+	cat >blame-hunk.c <<-\EOF &&
+	line1
+	line2
+	line3
+	line4
+	original5
+	original6
+	line7
+	line8
+	line9
+	line10
+	EOF
+	git add blame-hunk.c &&
+	git commit -m "add blame-hunk.c" &&
+	ORIG=$(git rev-parse --short HEAD) &&
+
+	cat >blame-hunk.c <<-\EOF &&
+	line1
+	line2
+	line3
+	line4
+	changed5
+	changed6
+	line7
+	line8
+	changed9
+	changed10
+	EOF
+	git add blame-hunk.c &&
+	git commit -m "change blame-hunk.c" &&
+	CHANGE=$(git rev-parse --short HEAD) &&
+
+	# With fixed-hunk mode the tool reports only lines 5-6 as changed,
+	# so blame should attribute lines 9-10 to the original commit
+	# even though the builtin diff would show them as changed.
+	git -c diff.cdiff.process="$BACKEND --mode=fixed-hunk" \
+		blame blame-hunk.c >actual &&
+	sed -n "9p" actual >line9 &&
+	sed -n "10p" actual >line10 &&
+	test_grep "$ORIG" line9 &&
+	test_grep "$ORIG" line10 &&
+	sed -n "5p" actual >line5 &&
+	sed -n "6p" actual >line6 &&
+	test_grep "$CHANGE" line5 &&
+	test_grep "$CHANGE" line6
+'
+
+test_expect_success 'blame skips commits with no hunks from diff process' '
+	cat >blame.c <<-\EOF &&
+	int main(void) {
+	return 0;
+	}
+	EOF
+	git add blame.c &&
+	git commit -m "add blame.c" &&
+	ORIG_COMMIT=$(git rev-parse --short HEAD) &&
+
+	cat >blame.c <<-\EOF &&
+	int main(void)
+	{
+	return 0;
+	}
+	EOF
+	git add blame.c &&
+	git commit -m "reformat blame.c" &&
+	BLAME_COMMIT=$(git rev-parse --short HEAD) &&
+
+	# Without no-hunks mode, blame attributes the change.
+	git blame blame.c >without &&
+	test_grep "$BLAME_COMMIT" without &&
+
+	# With no-hunks mode, the process considers the files equivalent
+	# and blame skips the reformat commit, attributing to the original.
+	git -c diff.cdiff.process="$BACKEND --mode=no-hunks" \
+		blame blame.c >with &&
+	test_grep ! "$BLAME_COMMIT" with &&
+	test_grep "$ORIG_COMMIT" with
+'
+
+test_expect_success 'blame --no-ext-diff bypasses diff process' '
+	test_when_finished "rm -f backend.log" &&
+	git -c diff.cdiff.process="$BACKEND --mode=no-hunks --log=backend.log" \
+		blame --no-ext-diff blame.c >actual &&
+	# Without the process, blame attributes the reformat commit normally.
+	test_grep "$BLAME_COMMIT" actual &&
+	test_path_is_missing backend.log
+'
+
+test_expect_success 'blame --no-ext-diff uses builtin hunks' '
+	# fixed-hunk mode would narrow blame to lines 5-6, but
+	# --no-ext-diff should bypass it and use the builtin diff.
+	test_when_finished "rm -f backend.log" &&
+	git -c diff.cdiff.process="$BACKEND --mode=fixed-hunk --log=backend.log" \
+		blame --no-ext-diff blame-hunk.c >actual &&
+	# Builtin diff attributes lines 9-10 to the change commit.
+	sed -n "9p" actual >line9 &&
+	test_grep "$CHANGE" line9 &&
+	test_path_is_missing backend.log
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related

* [PATCH 0/2] rebase: add --fixup to fold a range into its oldest commit
From: Harald Nordgren via GitGitGadget @ 2026-06-14 19:25 UTC (permalink / raw)
  To: git; +Cc: Harald Nordgren

Adds git rebase --autosquash --fixup [<upstream>] to fold a range of commits
into its oldest one, reusing that commit's message.

Related idea: https://github.com/gitgitgadget/git/issues/1135

Harald Nordgren (2):
  t3415: remove prepare-commit-msg hook after use
  rebase: add --fixup-all to fold a range

 Documentation/git-rebase.adoc |  11 ++++
 builtin/rebase.c              |  13 ++++-
 sequencer.c                   |  24 +++++++-
 sequencer.h                   |   2 +-
 t/t3415-rebase-autosquash.sh  | 106 ++++++++++++++++++++++++++++++++++
 5 files changed, 152 insertions(+), 4 deletions(-)


base-commit: ea97ad8d017de0c9037451a78008a0fd60abea0c
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-2337%2FHaraldNordgren%2Frebase-fixup-fold-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-2337/HaraldNordgren/rebase-fixup-fold-v1
Pull-Request: https://github.com/git/git/pull/2337
-- 
gitgitgadget

^ permalink raw reply

* [PATCH 1/2] t3415: remove prepare-commit-msg hook after use
From: Harald Nordgren via GitGitGadget @ 2026-06-14 19:25 UTC (permalink / raw)
  To: git; +Cc: Harald Nordgren, Harald Nordgren
In-Reply-To: <pull.2337.git.git.1781465141.gitgitgadget@gmail.com>

From: Harald Nordgren <haraldnordgren@gmail.com>

The "pick and fixup respect commit.cleanup" test left its
prepare-commit-msg hook in place, leaking it into later tests. Remove it
with test_when_finished.

Signed-off-by: Harald Nordgren <haraldnordgren@gmail.com>
---
 t/t3415-rebase-autosquash.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/t/t3415-rebase-autosquash.sh b/t/t3415-rebase-autosquash.sh
index 5033411a43..8964d1cc88 100755
--- a/t/t3415-rebase-autosquash.sh
+++ b/t/t3415-rebase-autosquash.sh
@@ -490,6 +490,7 @@ test_expect_success 'pick and fixup respect commit.cleanup' '
 	git reset --hard base &&
 	test_commit --no-tag "fixup! second commit" file1 fixup &&
 	test_commit something &&
+	test_when_finished "rm -f .git/hooks/prepare-commit-msg" &&
 	write_script .git/hooks/prepare-commit-msg <<-\EOF &&
 	printf "\n# Prepared\n" >> "$1"
 	EOF
-- 
gitgitgadget


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox