Git development

Git development
 help / color / mirror / Atom feed

* Re: [BUG] internal date format does not accept small unix timestamps
From: Junio C Hamano @ 2026-05-29 22:52 UTC (permalink / raw)
  To: Luna Schwalbe; +Cc: Kristoffer Haugsbakk, git
In-Reply-To: <08a04d91-af90-44dd-b28f-f3d5b9e77413@luna.gl>

Luna Schwalbe <dev@luna.gl> writes:

>  > Apparently you need `@` in front for small Unix Epoch values. `@0 +0000`
>
> That is wonderful, thank you so much, I somehow did not find this small 
> detail anywhere.
>
> Maybe it could be added to Documentation/date-formats.adoc?
>
> Luna

Good suggestion.

This was introduced in 116eb3ab (parse_date(): allow ancient
git-timestamp, 2012-02-02) and 2c733fb2 (parse_date(): '@' prefix
forces git-timestamp, 2012-02-02) to allow specifying "ancient"
timestamps (like 0 +0000) without conflicting with YYYYMMDD date
formats.  I do not think neither commit added documentation for this
'@' prefix, and Documentation/date-formats would be an excellent
place to do so.

Care to whip up a patch?

Thanks.

^ permalink raw reply

* Re: [PATCH 0/3] pack-objects: support bitmaps and delta-islands with `--path-walk`
From: Taylor Blau @ 2026-05-29 22:20 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Junio C Hamano, Jeff King, Elijah Newren
In-Reply-To: <2d68fdb2-ac05-4331-b53e-53c2e9a2b3d4@gmail.com>

On Fri, May 29, 2026 at 05:28:32PM -0400, Derrick Stolee wrote:
> > My reading here is that we get significantly smaller packs (i.e. all
> > 'test_size' tests drop from HEAD~3 to HEAD) in the same amount of time
> > (i.e. that all 'test_perf' tests are roughly flat).
>
> The sizes don't shrink, and in one case increases by a small amount. I'm
> happy to count those cases as noise from multi-threaded delta calculations
> being less deterministic.
>
> The _time_ taken to compute the packfiles is what decreases, though, which
> is promising.
>
> > That lines up with my expectation here, which is that even though we're
> > using bitmaps at read time, that's effectively seeding the verbatim pack
> > reuse over a significantly smaller pack, producing a much smaller output
> > pack as a result.
>
> Can you double-check this reasoning with my read of the data? The size
> isn't changing, but the computation time is.

Yeah, that's right, and my apologies for being in a slight rush when
sending this to you ;-).

The size staying flat makes sense, since both packs were generated with
--path-walk, we're just changing the way they're served. In HEAD~3, that
pack is generated on-the-fly and sent over to the client. At HEAD, we're
doing verbatim reuse over an already-existing pack which we got via
repacking (also generated with --path-walk).

So I think you get to the same end-result (more or less, modulo usual
delta patching via pack-reuse), but the time to get there drops
significantly since we don't have to (re)compute the pack.

> > +test_repack_with_args () {
> > +	args="$@"
> > +	export args
> > +
> > +	test_perf "repack with $args" '
> > +		git repack -adf $args
> > +	'
> > +
> > +	test_size "repack size with $args" '
> > +		gitdir=$(git rev-parse --git-dir) &&
> > +		pack=$(ls $gitdir/objects/pack/pack-*.pack) &&
> > +		test_file_size "$pack"
> > +	'
> > +}
> > +
> I see that these tests are extracted from test_all_... below:
>
> [...]
>
> Because the --use-bitmap-index and --write-bitmap-index args
> aren't appropriate across these different commands.

Exactly.

> nit: the diff would be more obvious if test_repack_with_args
> was defined after test_all_with_args so the hunk of existing
> tests wouldn't appear in the diff.

Fair enough :-).

> >  for version in 1 2
> >  do
> > -	test_all_with_args --name-hash-version=$version
> > +	arg="--name-hash-version=$version" &&
> > +
> > +	test_all_with_args "$arg" &&
> > +	test_repack_with_args "$arg" || return 1
> >  done
> >
> >  test_all_with_args --path-walk
> > +test_repack_with_args --path-walk
> > +
> > +# inverted order here: we want to test using reachability bitmaps on a
> > +# pack written with --path-walk
> > +test_repack_with_args --path-walk --write-bitmap-index
> > +test_all_with_args --use-bitmap-index
>
> So this allows us to test all of the different modes.
>
> > --- >8 ---
> >
> > I don't have a strong opinion on whether or not we should include that
> > in this series or elsewhere.
>
> I'm interested to see some results of your new p5313 test
> to make sure that we are getting expected size changes for
> the repack, since the p5311 tests were more focused on the
> thin fetch pack (and didn't show a change in size).
>
> For that, I'd be interested to see this test be included in
> a patch for future reference, too.

Will do.

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH 0/3] pack-objects: support bitmaps and delta-islands with `--path-walk`
From: Derrick Stolee @ 2026-05-29 21:28 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Junio C Hamano, Jeff King, Elijah Newren
In-Reply-To: <ahnx/oBWe9yyBZFg@nand.local>

On 5/29/2026 4:07 PM, Taylor Blau wrote:
> On Fri, May 29, 2026 at 01:26:33PM -0400, Derrick Stolee wrote:
>> On 5/28/26 11:28 AM, Derrick Stolee wrote:
>>> On 5/27/26 7:18 PM, Taylor Blau wrote:
>>
>>> Do you have any end-to-end performance data to demonstrate that these
>>> changes are effective at scale? Are we still producing packfiles with the
>>> pack-file compression and now with .bitmap files? How does this impact
>>> the performance of a clone or fetch when using a bitmap index at read
>>> time?
>>
>> Here's my attempt to use our existing performance tests to analyze the
>> impact of this series.
>>
>> Running p5311 against the base of this topic and this topic with
>> GIT_TEST_PACK_PATH_WALK=1, I get this output:
> 
> Yikes. That's not great, but see below for what I think is going on.
> 
> (As an aside, we can focus in on either lookup=true or lookup=false,
> since these are just controlling whether or not the bitmap lookup table
> is written. On a repository as small as git.git, this shouldn't make a
> huge difference either way. I have a separate series to make this the
> default and to clean up the t/perf suite accordingly, but haven't sent
> it to the list yet.)
> 
>> Test                                     HEAD~3    HEAD
>> -----------------------------------------------------------------
>> [...]
> 
> I think this is either the primary reason why you're not seeing an
> improvement here, or at least related to it...
> 
>> Do you have a good feeling for why the path-walk feature doesn't
>> make a huge change in these test scenarios?
> 
> I think the problem is that we're relying on the TEST_ variable to tell
> pack-objects to generate a pack using --path-walk, but treat it as a
> fallback.
> 
> I suspect that since p5311 invokes repack *without* the '--path-walk'
> option, we end up in this case within cmd_pack_objects():
> 
> 	if (path_walk < 0) {
> 		if (use_bitmap_index > 0 ||
> 		    !use_internal_rev_list)
> 			path_walk = 0;
> 		else if (the_repository->gitdir &&
> 			 the_repository->settings.pack_use_path_walk)
> 			path_walk = 1;
> 		else
> 			path_walk = git_env_bool("GIT_TEST_PACK_PATH_WALK", 0);
> 	}
> 
> , where `path_walk` is *not* set, but `use_bitmap_index` is since p5311
> set the 'pack.writeBitmaps' configuration (as a separate aside, this
> should really prefer the non-deprecated 'repack.writeBitmaps' variant).
I see. When `--use-bitmap-index` is specified, then the test variable is
ignored. So my tests aren't actually measuring the intended end state.

> So in that case, we fall back to `path_walk = 0`, and don't even bother
> reading the `GIT_TEST_PACK_PATH_WALK` variable.
> 
> If I modify the perf test like so:
> 
> --- 8< ---
> diff --git a/t/perf/p5311-pack-bitmaps-fetch.sh b/t/perf/p5311-pack-bitmaps-fetch.sh
> index 047efb995d6..1c9c99216e3 100755
> --- a/t/perf/p5311-pack-bitmaps-fetch.sh
> +++ b/t/perf/p5311-pack-bitmaps-fetch.sh
> @@ -13,7 +13,7 @@ test_fetch_bitmaps () {
>  	test_expect_success 'create bitmapped server repo' '
>  		git config pack.writebitmaps true &&
>  		git config pack.writeBitmapLookupTable '"$1"' &&
> -		git repack -ad
> +		git repack -ad --path-walk
>  	'
> 
>  	# simulate a fetch from a repository that last fetched N days ago, for
> --- >8 ---
> , then I can get significantly improved results when running without the
> GIT_TEST_PACK_PATH_WALK variablle (here I'm truncating the
> 'lookup=false' case, which performs nearly identically):
> 
>     Test                                       HEAD~3            HEAD
>     ------------------------------------------------------------------------------------
>     5311.4: server   (1 days) (lookup=true)    2.57(2.52+0.04)   0.03(0.02+0.00) -98.8%
>     5311.5: size     (1 days)                           153.4K            153.4K +0.0%
>     5311.6: client   (1 days) (lookup=true)    0.00(0.01+0.00)   0.00(0.01+0.00) =
>     5311.8: server   (2 days) (lookup=true)    2.60(2.55+0.04)   0.02(0.02+0.00) -99.2%
>     5311.9: size     (2 days)                           153.4K            153.4K +0.0%
>     5311.10: client   (2 days) (lookup=true)   0.00(0.01+0.00)   0.00(0.01+0.00) =
>     5311.12: server   (4 days) (lookup=true)   2.60(2.54+0.05)   0.03(0.03+0.00) -98.8%
>     5311.13: size     (4 days)                          209.0K            209.0K +0.0%
>     5311.14: client   (4 days) (lookup=true)   0.01(0.02+0.00)   0.01(0.01+0.00) +0.0%
>     5311.16: server   (8 days) (lookup=true)   2.58(2.53+0.04)   0.03(0.03+0.00) -98.8%
>     5311.17: size     (8 days)                          209.0K            209.0K +0.0%
>     5311.18: client   (8 days) (lookup=true)   0.01(0.01+0.00)   0.01(0.02+0.00) +0.0%
>     5311.20: server  (16 days) (lookup=true)   2.58(2.52+0.05)   0.03(0.03+0.00) -98.8%
>     5311.21: size    (16 days)                          209.0K            209.0K +0.0%
>     5311.22: client  (16 days) (lookup=true)   0.01(0.02+0.00)   0.01(0.01+0.00) +0.0%
>     5311.24: server  (32 days) (lookup=true)   2.61(2.58+0.03)   0.03(0.02+0.01) -98.9%
>     5311.25: size    (32 days)                          212.9K            212.9K +0.0%
>     5311.26: client  (32 days) (lookup=true)   0.01(0.02+0.00)   0.02(0.02+0.00) +100.0%
>     5311.28: server  (64 days) (lookup=true)   2.72(2.79+0.06)   0.19(0.30+0.03) -93.0%
>     5311.29: size    (64 days)                            4.5M              4.5M -0.0%
>     5311.30: client  (64 days) (lookup=true)   0.49(0.58+0.02)   0.48(0.56+0.04) -2.0%
>     5311.32: server (128 days) (lookup=true)   2.90(3.21+0.09)   0.35(0.70+0.04) -87.9%
>     5311.33: size   (128 days)                            9.4M              9.5M +0.4%
>     5311.34: client (128 days) (lookup=true)   0.98(1.27+0.08)   0.98(1.33+0.06) +0.0%
> 
> My reading here is that we get significantly smaller packs (i.e. all
> 'test_size' tests drop from HEAD~3 to HEAD) in the same amount of time
> (i.e. that all 'test_perf' tests are roughly flat).

The sizes don't shrink, and in one case increases by a small amount. I'm
happy to count those cases as noise from multi-threaded delta calculations
being less deterministic.

The _time_ taken to compute the packfiles is what decreases, though, which
is promising.

> That lines up with my expectation here, which is that even though we're
> using bitmaps at read time, that's effectively seeding the verbatim pack
> reuse over a significantly smaller pack, producing a much smaller output
> pack as a result.

Can you double-check this reasoning with my read of the data? The size
isn't changing, but the computation time is.

> As to whether we should modify the perf suite to test this, naturally I
> think we should. Likely that looks like modifying p5313 to re-run
> `test_all_with_args` with '--use-bitmap-index' after repacking with
> --path-walk and generating bitmaps like so (untested):
> 
> --- 8< ---
> diff --git a/t/perf/p5313-pack-objects.sh b/t/perf/p5313-pack-objects.sh
> index 46a6cd32d24..663717982b1 100755
> --- a/t/perf/p5313-pack-objects.sh
> +++ b/t/perf/p5313-pack-objects.sh
> @@ -22,6 +22,21 @@ test_expect_success 'create rev input' '
>  	EOF
>  '
> 
> +test_repack_with_args () {
> +	args="$@"
> +	export args
> +
> +	test_perf "repack with $args" '
> +		git repack -adf $args
> +	'
> +
> +	test_size "repack size with $args" '
> +		gitdir=$(git rev-parse --git-dir) &&
> +		pack=$(ls $gitdir/objects/pack/pack-*.pack) &&
> +		test_file_size "$pack"
> +	'
> +}
> +
I see that these tests are extracted from test_all_... below:

>  test_all_with_args () {
>  	parameter=$1
>  	export parameter
> @@ -52,23 +67,22 @@ test_all_with_args () {
>  	test_size "shallow pack size with $parameter" '
>  		test_file_size out
>  	'
> -
> -	test_perf "repack with $parameter" '
> -		git repack -adf $parameter
> -	'
> -
> -	test_size "repack size with $parameter" '
> -		gitdir=$(git rev-parse --git-dir) &&
> -		pack=$(ls $gitdir/objects/pack/pack-*.pack) &&
> -		test_file_size "$pack"
> -	'
>  }

Because the --use-bitmap-index and --write-bitmap-index args
aren't appropriate across these different commands.

nit: the diff would be more obvious if test_repack_with_args
was defined after test_all_with_args so the hunk of existing
tests wouldn't appear in the diff.

>  for version in 1 2
>  do
> -	test_all_with_args --name-hash-version=$version
> +	arg="--name-hash-version=$version" &&
> +
> +	test_all_with_args "$arg" &&
> +	test_repack_with_args "$arg" || return 1
>  done
> 
>  test_all_with_args --path-walk
> +test_repack_with_args --path-walk
> +
> +# inverted order here: we want to test using reachability bitmaps on a
> +# pack written with --path-walk
> +test_repack_with_args --path-walk --write-bitmap-index
> +test_all_with_args --use-bitmap-index

So this allows us to test all of the different modes.

> --- >8 ---
> 
> I don't have a strong opinion on whether or not we should include that
> in this series or elsewhere.
I'm interested to see some results of your new p5313 test
to make sure that we are getting expected size changes for
the repack, since the p5311 tests were more focused on the
thin fetch pack (and didn't show a change in size).

For that, I'd be interested to see this test be included in
a patch for future reference, too.

Thanks,
-Stolee


^ permalink raw reply

* Re: git hook question
From: Jeff King @ 2026-05-29 21:00 UTC (permalink / raw)
  To: Wesley Schwengle; +Cc: Adrian Ratiu, git
In-Reply-To: <4d938e1e-fdd3-42d6-a879-4d394ee8c00d@opperschaap.net>

[re-adding list cc; let's let everyone benefit from the discussion]

On Fri, May 29, 2026 at 04:14:33PM -0400, Wesley Schwengle wrote:

> > I don't think the hooks themselves should need to be aware. If somebody
> > is calling "git hook run pre-push" without providing arguments, they are
> > breaking the contract to the hooks. You can get away with it if you know
> > your particular hooks do not care about those arguments, but in the
> > general case, what should a pre-push hook that _does_ care about the
> > remote name do when it doesn't get any arguments? It's an error.
> 
> Are they? The manual says this:
> 
> git hook run has been designed to make it easy for tools which wrap Git to
> configure and execute hooks using the Git hook infrastructure.  It is
> possible to provide arguments and stdin via the command line, as well as
> specifying parallel or series execution if the user has provided multiple
> hooks.
> 
>      Assuming your wrapper wants to support a hook named
> "mywrapper-start-tests", you can have your users specify their hooks like
> so:
> 
>          [hook "setup-test-dashboard"]
>            event = mywrapper-start-tests
>            command = ~/mywrapper/setup-dashboard.py --tap
> 
>      Then, in your mywrapper tool, you can invoke any users' configured
> hooks by running:
> 
>          git hook run --allow-unknown-hook-name mywrapper-start-tests \
>            # providing something to stdin
>            --stdin some-tempfile-123 \
>            # execute multiple hooks in parallel
>            --jobs 3 \
>            # plus some arguments of your own...
>            -- \
>            --testname bar \
>            baz
> 
> There is nothing about the contract of the hook, in fact, the way it is
> written there isn't really a contract.

This is a made-up hook, so it is up to the person defining
mywrapper-start-tests to define that contract. And in this example,
implicitly it takes whatever is in some-tempfile-123 on stdin, and
--testname as an argument. What those mean would need to be communicated
between the script invoking "git hook" and whoever is configuring hooks.

I agree that is not made very clear in the documentation, though.

> > So whether you are getting input as arguments or over stdin, it's
> > probably something the hook needs to deal with (or at least think
> > about).
> 
> Right. I see where this is going. That means I think the examples in the
> manual are incorrect, no, that's harsh, it could be stated more clearly in
> git-hook(1).
> 
> Examples like this:
> 
> > [hook "linter"]
> >   event = pre-commit
> >   command = ~/bin/linter --cpp20
> 
> seem to indicate: Any script can be run as a hook, the fact it needs to
> respect the native hook structure isn't mentioned. This is mentioned:

That example is OK-ish, in the sense that pre-commit does not take any
arguments or receive anything on stdin. So you really can invoke
whatever program you like (though it needs to understand how to use Git
commands to look at what is staged in the index). So the details of
"~/bin/linter" are doing a lot of the heavy lifting here, which is left
unsaid.

But the later example that adds "event = pre-push" is actively
misleading. How does the ~/bin/linter script even know in which context
it's being run? In the real world you are more likely to invoke a script
that is aware it is a Git hook and can react accordingly.

So I suspect there is a lot of room for expanding the documentation and
explaining some of these gotchas. +cc Adrian, who wrote these docs, for
visibility.

-Peff

^ permalink raw reply

* [PATCH v3 6/6] blame: consult diff process for no-hunk detection
From: Michael Montalbo via GitGitGadget @ 2026-05-29 20:48 UTC (permalink / raw)
  To: git; +Cc: Michael Montalbo, Michael Montalbo
In-Reply-To: <pull.2120.v3.git.1780087700.gitgitgadget@gmail.com>

From: Michael Montalbo <mmontalbo@gmail.com>

When a diff process is configured via diff.<driver>.process,
consult it during blame's per-commit diffing.  If the process
returns no hunks for a commit's changes to a file, treat the
commit as having no changes, causing blame to attribute lines
to earlier commits.

The consultation happens at the pass_blame_to_parent() callsite
using diff_process_fill_hunks(), matching how builtin_diff() in
diff.c uses the same function.  A new diff_hunks_xpp() variant
accepts a pre-populated xpparam_t for this callsite, while the
existing diff_hunks() retains its original signature and behavior.
The copy-detection callsite is unaffected since it does not use
the diff process.

The subprocess is long-running (one startup cost amortized
across the blame traversal), but each commit in the file's
history incurs a round-trip to the tool.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
---
 blame.c                 |  40 +++++++++++----
 t/t4080-diff-process.sh | 106 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 137 insertions(+), 9 deletions(-)

diff --git a/blame.c b/blame.c
index a3c49d132e..79f02735a4 100644
--- a/blame.c
+++ b/blame.c
@@ -19,6 +19,8 @@
 #include "tag.h"
 #include "trace2.h"
 #include "blame.h"
+#include "diff-process.h"
+#include "xdiff-interface.h"
 #include "alloc.h"
 #include "commit-slab.h"
 #include "bloom.h"
@@ -314,17 +316,25 @@ static struct commit *fake_working_tree_commit(struct repository *r,
 
 
 
-static int diff_hunks(mmfile_t *file_a, mmfile_t *file_b,
-		      xdl_emit_hunk_consume_func_t hunk_func, void *cb_data, int xdl_opts)
+static int diff_hunks_xpp(mmfile_t *file_a, mmfile_t *file_b,
+			  xdl_emit_hunk_consume_func_t hunk_func,
+			  void *cb_data, xpparam_t *xpp)
 {
-	xpparam_t xpp = {0};
 	xdemitconf_t xecfg = {0};
 	xdemitcb_t ecb = {NULL};
 
-	xpp.flags = xdl_opts;
 	xecfg.hunk_func = hunk_func;
 	ecb.priv = cb_data;
-	return xdi_diff(file_a, file_b, &xpp, &xecfg, &ecb);
+	return xdi_diff(file_a, file_b, xpp, &xecfg, &ecb);
+}
+
+static int diff_hunks(mmfile_t *file_a, mmfile_t *file_b,
+		      xdl_emit_hunk_consume_func_t hunk_func, void *cb_data, int xdl_opts)
+{
+	xpparam_t xpp = {0};
+
+	xpp.flags = xdl_opts;
+	return diff_hunks_xpp(file_a, file_b, hunk_func, cb_data, &xpp);
 }
 
 static const char *get_next_line(const char *start, const char *end)
@@ -1943,6 +1953,7 @@ static void pass_blame_to_parent(struct blame_scoreboard *sb,
 				 struct blame_origin *parent, int ignore_diffs)
 {
 	mmfile_t file_p, file_o;
+	xpparam_t xpp = {0};
 	struct blame_chunk_cb_data d;
 	struct blame_entry *newdest = NULL;
 
@@ -1961,10 +1972,21 @@ static void pass_blame_to_parent(struct blame_scoreboard *sb,
 			 &sb->num_read_blob, ignore_diffs);
 	sb->num_get_patch++;
 
-	if (diff_hunks(&file_p, &file_o, blame_chunk_cb, &d, sb->xdl_opts))
-		die("unable to generate diff (%s -> %s)",
-		    oid_to_hex(&parent->commit->object.oid),
-		    oid_to_hex(&target->commit->object.oid));
+	xpp.flags = sb->xdl_opts;
+	/*
+	 * If the diff process considers the files equivalent,
+	 * skip the diff so blame looks past this commit.
+	 */
+	if (diff_process_fill_hunks(&sb->revs->diffopt, target->path,
+				    &file_p, &file_o, &xpp)
+	    != DIFF_PROCESS_EQUIVALENT) {
+		if (diff_hunks_xpp(&file_p, &file_o, blame_chunk_cb,
+				   &d, &xpp))
+			die("unable to generate diff (%s -> %s)",
+			    oid_to_hex(&parent->commit->object.oid),
+			    oid_to_hex(&target->commit->object.oid));
+	}
+	free(xpp.external_hunks);
 	/* The rest are the same as the parent */
 	blame_chunk(&d.dstq, &d.srcq, INT_MAX, d.offset, INT_MAX, 0,
 		    parent, target, 0);
diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
index ee0c306abd..fdf6da1c34 100755
--- a/t/t4080-diff-process.sh
+++ b/t/t4080-diff-process.sh
@@ -551,4 +551,110 @@ test_expect_success PYTHON 'diff process fallback on overlapping hunks' '
 	test_grep "NEW5" actual
 '
 
+#
+# Blame integration.
+#
+
+test_expect_success PYTHON 'blame uses tool-provided hunks' '
+	cat >blame-hunk.c <<-\EOF &&
+	line1
+	line2
+	line3
+	line4
+	original5
+	original6
+	line7
+	line8
+	line9
+	line10
+	EOF
+	git add blame-hunk.c &&
+	git commit -m "add blame-hunk.c" &&
+	ORIG=$(git rev-parse --short HEAD) &&
+
+	cat >blame-hunk.c <<-\EOF &&
+	line1
+	line2
+	line3
+	line4
+	changed5
+	changed6
+	line7
+	line8
+	changed9
+	changed10
+	EOF
+	git add blame-hunk.c &&
+	git commit -m "change blame-hunk.c" &&
+	CHANGE=$(git rev-parse --short HEAD) &&
+
+	# With fixed-hunk mode the tool reports only lines 5-6 as changed,
+	# so blame should attribute lines 9-10 to the original commit
+	# even though the builtin diff would show them as changed.
+	git -c diff.cdiff.process="$BACKEND --mode=fixed-hunk" \
+		blame blame-hunk.c >actual &&
+	sed -n "9p" actual >line9 &&
+	sed -n "10p" actual >line10 &&
+	test_grep "$ORIG" line9 &&
+	test_grep "$ORIG" line10 &&
+	sed -n "5p" actual >line5 &&
+	sed -n "6p" actual >line6 &&
+	test_grep "$CHANGE" line5 &&
+	test_grep "$CHANGE" line6
+'
+
+test_expect_success PYTHON 'blame skips commits with no hunks from diff process' '
+	cat >blame.c <<-\EOF &&
+	int main(void)
+	{
+	    return 0;
+	}
+	EOF
+	git add blame.c &&
+	git commit -m "add blame.c" &&
+	ORIG_COMMIT=$(git rev-parse --short HEAD) &&
+
+	cat >blame.c <<-\EOF &&
+	int main(void)
+	{
+	        return 0;
+	}
+	EOF
+	git add blame.c &&
+	git commit -m "reformat blame.c" &&
+	BLAME_COMMIT=$(git rev-parse --short HEAD) &&
+
+	# Without no-hunks mode, blame attributes the change.
+	git blame blame.c >without &&
+	test_grep "$BLAME_COMMIT" without &&
+
+	# With no-hunks mode, the process considers the files equivalent
+	# and blame skips the reformat commit, attributing to the original.
+	git -c diff.cdiff.process="$BACKEND --mode=no-hunks" \
+		blame blame.c >with &&
+	test_grep ! "$BLAME_COMMIT" with &&
+	test_grep "$ORIG_COMMIT" with
+'
+
+test_expect_success PYTHON 'blame --no-ext-diff bypasses diff process' '
+	rm -f backend.log &&
+	git -c diff.cdiff.process="$BACKEND --mode=no-hunks --log=backend.log" \
+		blame --no-ext-diff blame.c >actual &&
+	# Without the process, blame attributes the reformat commit normally.
+	test_grep "$BLAME_COMMIT" actual &&
+	test_path_is_missing backend.log
+'
+
+test_expect_success PYTHON 'blame --no-ext-diff uses builtin hunks' '
+	# fixed-hunk mode would narrow blame to lines 5-6, but
+	# --no-ext-diff should bypass it and use the builtin diff.
+	rm -f backend.log &&
+	git -c diff.cdiff.process="$BACKEND --mode=fixed-hunk --log=backend.log" \
+		blame --no-ext-diff blame-hunk.c >actual &&
+	# Builtin diff attributes lines 9-10 to the change commit.
+	sed -n "9p" actual >line9 &&
+	test_grep "$CHANGE" line9 &&
+	test_path_is_missing backend.log
+'
+
 test_done
-- 
gitgitgadget

^ permalink raw reply related

* [PATCH v3 5/6] diff: bypass diff process with --no-ext-diff and in format-patch
From: Michael Montalbo via GitGitGadget @ 2026-05-29 20:48 UTC (permalink / raw)
  To: git; +Cc: Michael Montalbo, Michael Montalbo
In-Reply-To: <pull.2120.v3.git.1780087700.gitgitgadget@gmail.com>

From: Michael Montalbo <mmontalbo@gmail.com>

Make --no-ext-diff disable diff.<driver>.process in addition to
diff.<driver>.command.  Although the two mechanisms work differently
(command replaces Git's output, process feeds hunks back into the
pipeline), both invoke external tools and --no-ext-diff means
"no external tools."

Replace the OPT_BOOL for --ext-diff with an OPT_CALLBACK that
sets both allow_external and no_diff_process, so a single option
controls both.  Passing --ext-diff explicitly clears
no_diff_process, so a later --ext-diff overrides an earlier
--no-ext-diff.

Disable the diff process unconditionally in format-patch so that
generated patches are always based on the builtin diff algorithm
and can be applied reliably by recipients who do not have the
external tool.

Document that --diff-algorithm also bypasses the diff process,
since it sets ignore_driver_algorithm which diff_process_fill_hunks
already checks.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
---
 Documentation/diff-algorithm-option.adoc |  3 +++
 Documentation/diff-options.adoc          |  4 +++-
 builtin/log.c                            |  7 +++++++
 diff.c                                   | 16 ++++++++++++++--
 diff.h                                   |  4 +++-
 t/t4080-diff-process.sh                  | 16 ++++++++++++++++
 6 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/Documentation/diff-algorithm-option.adoc b/Documentation/diff-algorithm-option.adoc
index 8e3a0b63d7..4d7e2ec35f 100644
--- a/Documentation/diff-algorithm-option.adoc
+++ b/Documentation/diff-algorithm-option.adoc
@@ -18,3 +18,6 @@
 For instance, if you configured the `diff.algorithm` variable to a
 non-default value and want to use the default one, then you
 have to use `--diff-algorithm=default` option.
++
+If you explicitly choose a diff algorithm, it also bypasses
+`diff.<driver>.process` (see linkgit:gitattributes[5]).
diff --git a/Documentation/diff-options.adoc b/Documentation/diff-options.adoc
index 8a63b5e164..18b8b0ed24 100644
--- a/Documentation/diff-options.adoc
+++ b/Documentation/diff-options.adoc
@@ -825,7 +825,9 @@ endif::git-format-patch[]
 	to use this option with linkgit:git-log[1] and friends.
 
 `--no-ext-diff`::
-	Disallow external diff drivers.
+	Disallow external diff helpers, including
+	`diff.<driver>.command` and `diff.<driver>.process`
+	(see linkgit:gitattributes[5]).
 
 `--textconv`::
 `--no-textconv`::
diff --git a/builtin/log.c b/builtin/log.c
index 8c0939dd42..1ea520c12d 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -2213,6 +2213,13 @@ int cmd_format_patch(int argc,
 	if (argc > 1)
 		die(_("unrecognized argument: %s"), argv[1]);
 
+	/*
+	 * Disable diff.<driver>.process so that patches generated by
+	 * format-patch are always based on the builtin diff algorithm
+	 * and can be applied reliably.
+	 */
+	rev.diffopt.flags.no_diff_process = 1;
+
 	if (rev.diffopt.output_format & DIFF_FORMAT_NAME)
 		die(_("--name-only does not make sense"));
 	if (rev.diffopt.output_format & DIFF_FORMAT_NAME_STATUS)
diff --git a/diff.c b/diff.c
index 2d5ed6ea8c..38932db084 100644
--- a/diff.c
+++ b/diff.c
@@ -5913,6 +5913,17 @@ static int diff_opt_submodule(const struct option *opt,
 	return 0;
 }
 
+static int diff_opt_ext_diff(const struct option *opt,
+			     const char *arg, int unset)
+{
+	struct diff_options *options = opt->value;
+
+	BUG_ON_OPT_ARG(arg);
+	options->flags.allow_external = !unset;
+	options->flags.no_diff_process = unset;
+	return 0;
+}
+
 static int diff_opt_textconv(const struct option *opt,
 			     const char *arg, int unset)
 {
@@ -6241,8 +6252,9 @@ struct option *add_diff_options(const struct option *opts,
 			 N_("exit with 1 if there were differences, 0 otherwise")),
 		OPT_BOOL(0, "quiet", &options->flags.quick,
 			 N_("disable all output of the program")),
-		OPT_BOOL(0, "ext-diff", &options->flags.allow_external,
-			 N_("allow an external diff helper to be executed")),
+		OPT_CALLBACK_F(0, "ext-diff", options, NULL,
+			       N_("allow an external diff helper to be executed"),
+			       PARSE_OPT_NOARG, diff_opt_ext_diff),
 		OPT_CALLBACK_F(0, "textconv", options, NULL,
 			       N_("run external text conversion filters when comparing binary files"),
 			       PARSE_OPT_NOARG, diff_opt_textconv),
diff --git a/diff.h b/diff.h
index d1e5a13e9e..71ba389f03 100644
--- a/diff.h
+++ b/diff.h
@@ -173,7 +173,9 @@ struct diff_flags {
 	 */
 	unsigned allow_external;
 
-	/** Disables diff.<driver>.process. */
+	/**
+	 * Disables diff.<driver>.process.  Set by --no-ext-diff.
+	 */
 	unsigned no_diff_process;
 
 	/**
diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
index f159cd86d8..ee0c306abd 100755
--- a/t/t4080-diff-process.sh
+++ b/t/t4080-diff-process.sh
@@ -419,6 +419,22 @@ test_expect_success PYTHON 'diff process bypassed by --diff-algorithm' '
 	test_path_is_missing backend.log
 '
 
+test_expect_success PYTHON 'diff process bypassed by --no-ext-diff' '
+	rm -f backend.log &&
+	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
+		diff --no-ext-diff worddiff.c >actual &&
+	test_grep "return 999" actual &&
+	test_path_is_missing backend.log
+'
+
+test_expect_success PYTHON 'diff process not used by format-patch' '
+	rm -f backend.log &&
+	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
+		format-patch -1 --stdout -- logtest.c >actual &&
+	test_grep "return 2" actual &&
+	test_path_is_missing backend.log
+'
+
 test_expect_success PYTHON 'diff process not used by --stat' '
 	rm -f backend.log &&
 	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v3 4/6] diff: add long-running diff process via diff.<driver>.process
From: Michael Montalbo via GitGitGadget @ 2026-05-29 20:48 UTC (permalink / raw)
  To: git; +Cc: Michael Montalbo, Michael Montalbo
In-Reply-To: <pull.2120.v3.git.1780087700.gitgitgadget@gmail.com>

From: Michael Montalbo <mmontalbo@gmail.com>

Add support for external diff processes that communicate via the
long-running process protocol (pkt-line over stdin/stdout).

A diff process is configured per userdiff driver:

    [diff "cdiff"]
        process = /path/to/diff-tool

The tool provides custom line-matching: it receives file pairs
and returns hunks that reference line numbers in the content.
When textconv is also configured, the tool receives the
textconv-transformed content.  The tool controls which lines
are marked as changed while the display shows the file content.
Patch output features (word diff, function context, color) work
normally; summary formats like --stat use their own diff path
and are not affected.

The handshake negotiates version=1 and capability=hunks.  Per-file
requests send command=hunks, pathname, and both file contents as
packetized data.  The tool responds with hunk lines and a status
packet (success, error, or abort).  On error, Git warns and falls
back to the builtin diff algorithm for that file.  On abort, Git
silently falls back for the current file and stops sending further
requests to the tool for the remainder of the session.

When the tool returns no hunks followed by status=success, Git
treats the file as having no changes and produces no diff output.
This also means --exit-code reports no changes for that file.

The subprocess is stored on the userdiff_driver struct and
launched on first use.  If the process fails to start, the
handshake fails, or a communication error occurs mid-stream,
the failure is cached on the driver to avoid retrying and
re-warning on every subsequent file.

diff_process_fill_hunks() is the sole public entry point.  It
handles driver lookup, flag checks, subprocess management, and
error reporting, returning an enum that lets callers distinguish
"hunks populated" from "files equivalent" from "not applicable"
from "tool failure."

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
---
 Documentation/config/diff.adoc   |   5 +
 Documentation/gitattributes.adoc | 139 ++++++++
 Makefile                         |   1 +
 diff-process.c                   | 288 +++++++++++++++++
 diff-process.h                   |  39 +++
 diff.c                           |  13 +
 diff.h                           |   3 +
 meson.build                      |   1 +
 t/.gitattributes                 |   1 +
 t/meson.build                    |   1 +
 t/t4080-diff-process.sh          | 538 +++++++++++++++++++++++++++++++
 userdiff.h                       |   3 +
 12 files changed, 1032 insertions(+)
 create mode 100644 diff-process.c
 create mode 100644 diff-process.h
 create mode 100755 t/t4080-diff-process.sh

diff --git a/Documentation/config/diff.adoc b/Documentation/config/diff.adoc
index 1135a62a0a..ac0635bb3b 100644
--- a/Documentation/config/diff.adoc
+++ b/Documentation/config/diff.adoc
@@ -218,6 +218,11 @@ endif::git-diff[]
 	Set this option to `true` to make the diff driver cache the text
 	conversion outputs.  See linkgit:gitattributes[5] for details.
 
+`diff.<driver>.process`::
+	The command to run as a long-running diff process that
+	provides hunks to Git's diff pipeline.
+	See linkgit:gitattributes[5] for details.
+
 `diff.indentHeuristic`::
 	Set this option to `false` to disable the default heuristics
 	that shift diff hunk boundaries to make patches easier to read.
diff --git a/Documentation/gitattributes.adoc b/Documentation/gitattributes.adoc
index f20041a323..1700bd8e97 100644
--- a/Documentation/gitattributes.adoc
+++ b/Documentation/gitattributes.adoc
@@ -821,6 +821,145 @@ NOTE: If `diff.<name>.command` is defined for path with the
 (see above), and adding `diff.<name>.algorithm` has no effect, as the
 algorithm is not passed to the external diff driver.
 
+Using an external diff process
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+If `diff.<name>.process` is defined, Git sends the old and new file
+content to an external tool and receives back a list of changed
+regions (pairs of line ranges in the old and new file).  Git uses
+these instead of its builtin diff algorithm, but still controls
+all output formatting, so features like word diff, function context,
+color, and blame work normally.  This is achieved by using the
+long-running process protocol (described in
+Documentation/technical/long-running-process-protocol.adoc).
+Unlike `diff.<name>.command`, which replaces Git's output entirely,
+the diff process feeds results back into the standard pipeline.
+
+First, in `.gitattributes`, assign the `diff` attribute for paths.
+
+------------------------
+*.c diff=cdiff
+------------------------
+
+Then, define a "diff.<name>.process" configuration to specify
+the diff process command.
+
+----------------------------------------------------------------
+[diff "cdiff"]
+  process = /path/to/diff-process-tool
+----------------------------------------------------------------
+
+When Git encounters the first file that needs to be diffed, it starts
+the process and performs the handshake.  In the handshake, the welcome
+message sent by Git is "git-diff-client", only version 1 is supported,
+and the supported capability is "hunks" (the changed regions
+described below).
+
+For each file, Git sends a list of "key=value" pairs terminated with
+a flush packet, followed by the old and new file content as packetized
+data, each terminated with a flush packet.  The pathname is relative
+to the repository root.  When `diff.<name>.textconv` is also set,
+the tool receives the textconv-transformed content rather than the
+raw blob.  Git does not send binary files to the diff process.
+
+-----------------------
+packet:          git> command=hunks
+packet:          git> pathname=path/file.c
+packet:          git> 0000
+packet:          git> OLD_CONTENT
+packet:          git> 0000
+packet:          git> NEW_CONTENT
+packet:          git> 0000
+-----------------------
+
+The tool is expected to respond with zero or more hunk lines,
+a flush packet, and a status packet terminated with a flush packet.
+Each hunk line has the form:
+
+  `hunk <old_start> <old_count> <new_start> <new_count>`
+
+where `<old_start>` and `<old_count>` identify a range of lines in
+the old file, and `<new_start>` and `<new_count>` identify the
+replacement range in the new file.  Start values are 1-based and
+counts are non-negative.  Ranges must not extend beyond the end of
+the file.  For example, `hunk 3 2 3 4` means that 2 lines starting
+at line 3 in the old file were replaced by 4 lines starting at
+line 3 in the new file.  An `<old_count>` of 0 means no lines were
+removed (pure insertion); a `<new_count>` of 0 means no lines were
+added (pure deletion).
+
+Lines are delimited by newlines.  A file `"foo\nbar\n"` and a
+file `"foo\nbar"` both have 2 lines.
+
+Hunks must be listed in order and must not overlap.  Any line
+not covered by a hunk is treated as unchanged, so the total
+number of unchanged lines must be the same on both sides.
+For example, if the old file has 10 lines and the hunks cover
+4 of them (`old_count` values summing to 4), then 6 old lines
+are unchanged.  The new file must also have exactly 6 lines
+not covered by hunks, so the `new_count` values must sum to
+`new_file_lines - 6`.
+
+-----------------------
+packet:          git< hunk 1 3 1 5
+packet:          git< hunk 10 2 12 2
+packet:          git< 0000
+packet:          git< status=success
+packet:          git< 0000
+-----------------------
+
+If the tool responds with hunks and "success", Git marks those lines
+as changed and feeds them into the standard diff pipeline.  Patch
+output features (word diff, function context, color) work normally.
+Note that `--stat` and other summary formats use their own diff path
+and are not affected by the diff process.
+
+If no hunk lines precede the flush, followed by "success", Git
+treats the files as having no changes: `git diff` produces no output
+and `git blame` skips the commit, attributing lines to earlier commits.
+
+-----------------------
+packet:          git< 0000
+packet:          git< status=success
+packet:          git< 0000
+-----------------------
+
+If the tool returns invalid hunks (out of bounds, overlapping), Git
+silently falls back to the builtin diff algorithm.
+
+In case the tool cannot or does not want to process the content,
+it is expected to respond with an "error" status.  Git warns and
+falls back to the builtin diff algorithm for this file.  The tool
+remains available for subsequent files.
+
+-----------------------
+packet:          git< 0000
+packet:          git< status=error
+packet:          git< 0000
+-----------------------
+
+In case the tool cannot or does not want to process the content as
+well as any future content for the lifetime of the Git process, it
+is expected to respond with an "abort" status.  Git silently falls
+back to the builtin diff algorithm for this file and does not send
+further requests to the tool.
+
+-----------------------
+packet:          git< 0000
+packet:          git< status=abort
+packet:          git< 0000
+-----------------------
+
+If the tool dies during the communication or does not adhere to the
+protocol then Git will stop the process and fall back to the builtin
+diff algorithm.  Git warns once and does not restart the process for
+subsequent files.
+
+Tools should ignore unknown keys in the per-file request to remain
+forward-compatible.  Future versions of Git may send additional
+`command=` values; tools that receive an unrecognized command should
+respond with `status=error` rather than terminating.
+
 Defining a custom hunk-header
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
diff --git a/Makefile b/Makefile
index cedc234173..22900368dd 100644
--- a/Makefile
+++ b/Makefile
@@ -1142,6 +1142,7 @@ LIB_OBJS += diff-delta.o
 LIB_OBJS += diff-merges.o
 LIB_OBJS += diff-lib.o
 LIB_OBJS += diff-no-index.o
+LIB_OBJS += diff-process.o
 LIB_OBJS += diff.o
 LIB_OBJS += diffcore-break.o
 LIB_OBJS += diffcore-delta.o
diff --git a/diff-process.c b/diff-process.c
new file mode 100644
index 0000000000..d2ef9463d7
--- /dev/null
+++ b/diff-process.c
@@ -0,0 +1,288 @@
+/*
+ * Diff process backend: communicates with a long-running external
+ * tool via the pkt-line protocol to obtain custom line-matching
+ * results.  The tool controls which lines are marked as changed
+ * while the display shows the file content (after any textconv
+ * transformation, if configured).
+ *
+ * Protocol: pkt-line over stdin/stdout, following the pattern of
+ * the long-running filter process protocol (see convert.c).
+ *
+ * Handshake:
+ *   git> git-diff-client / version=1 / flush
+ *   tool< git-diff-server / version=1 / flush
+ *   git> capability=hunks / flush
+ *   tool< capability=hunks / flush
+ *
+ * Per-file:
+ *   git> command=hunks / pathname=<path> / flush
+ *   git> <old content packetized> / flush
+ *   git> <new content packetized> / flush
+ *   tool< hunk <old_start> <old_count> <new_start> <new_count>
+ *   tool< ... / flush
+ *   tool< status=success / flush
+ *
+ * When the tool returns no hunks with status=success, it considers
+ * the files equivalent.  Git will skip the diff for that file.
+ */
+
+#include "git-compat-util.h"
+#include "diff-process.h"
+#include "diff.h"
+#include "gettext.h"
+#include "repository.h"
+#include "sigchain.h"
+#include "userdiff.h"
+#include "sub-process.h"
+#include "pkt-line.h"
+#include "strbuf.h"
+#include "xdiff/xdiff.h"
+
+#define CAP_HUNKS (1u << 0)
+
+struct diff_subprocess {
+	struct subprocess_entry subprocess;
+	unsigned int supported_capabilities;
+};
+
+static int start_diff_process_fn(struct subprocess_entry *subprocess)
+{
+	static int versions[] = { 1, 0 };
+	static struct subprocess_capability capabilities[] = {
+		{ "hunks", CAP_HUNKS },
+		{ NULL, 0 }
+	};
+	struct diff_subprocess *entry =
+		container_of(subprocess, struct diff_subprocess, subprocess);
+
+	return subprocess_handshake(subprocess, "git-diff",
+				    versions, NULL,
+				    capabilities,
+				    &entry->supported_capabilities);
+}
+
+static struct diff_subprocess *get_or_launch_process(
+		struct userdiff_driver *drv)
+{
+	struct diff_subprocess *entry;
+
+	if (drv->diff_subprocess)
+		return drv->diff_subprocess;
+
+	entry = xcalloc(1, sizeof(*entry));
+	if (subprocess_start_command(&entry->subprocess, drv->process,
+				     start_diff_process_fn)) {
+		free(entry);
+		drv->diff_process_failed = 1;
+		return NULL;
+	}
+
+	drv->diff_subprocess = entry;
+	return entry;
+}
+
+static int send_file_content(int fd, const char *buf, long size)
+{
+	int ret = 0;
+
+	if (size < 0)
+		return -1;
+	if (size > 0)
+		ret = write_packetized_from_buf_no_flush(buf, size, fd);
+	if (ret)
+		return ret;
+	return packet_flush_gently(fd);
+}
+
+static int parse_hunk_line(const char *line, struct xdl_hunk *hunk)
+{
+	char *end;
+
+	/*
+	 * Format: "hunk <old_start> <old_count> <new_start> <new_count>"
+	 * All numbers must be non-negative decimal with no leading
+	 * whitespace or sign characters.
+	 */
+	if (!skip_prefix(line, "hunk ", &line))
+		return -1;
+
+	if (!isdigit(*line))
+		return -1;
+	errno = 0;
+	hunk->old_start = strtol(line, &end, 10);
+	if (errno || end == line || *end++ != ' ')
+		return -1;
+	line = end;
+
+	if (!isdigit(*line))
+		return -1;
+	errno = 0;
+	hunk->old_count = strtol(line, &end, 10);
+	if (errno || end == line || *end++ != ' ')
+		return -1;
+	line = end;
+
+	if (!isdigit(*line))
+		return -1;
+	errno = 0;
+	hunk->new_start = strtol(line, &end, 10);
+	if (errno || end == line || *end++ != ' ')
+		return -1;
+	line = end;
+
+	if (!isdigit(*line))
+		return -1;
+	errno = 0;
+	hunk->new_count = strtol(line, &end, 10);
+	if (errno || end == line || *end != '\0')
+		return -1;
+
+	return 0;
+}
+
+static enum diff_process_result get_hunks(
+		struct userdiff_driver *drv,
+		const char *path,
+		const char *old_buf, long old_size,
+		const char *new_buf, long new_size,
+		struct xdl_hunk **hunks_out,
+		size_t *nr_hunks_out)
+{
+	struct diff_subprocess *backend;
+	struct child_process *process;
+	int fd_in, fd_out;
+	struct strbuf status = STRBUF_INIT;
+	struct xdl_hunk *hunks = NULL;
+	struct xdl_hunk hunk;
+	size_t nr_hunks = 0, alloc_hunks = 0;
+	int len;
+	char *line;
+
+	backend = get_or_launch_process(drv);
+	if (!backend)
+		return DIFF_PROCESS_ERROR;
+
+	if (!(backend->supported_capabilities & CAP_HUNKS))
+		return DIFF_PROCESS_SKIP;
+
+	process = subprocess_get_child_process(&backend->subprocess);
+	fd_in = process->in;
+	fd_out = process->out;
+
+	sigchain_push(SIGPIPE, SIG_IGN);
+
+	/* Send request */
+	if (packet_write_fmt_gently(fd_in, "command=hunks\n") ||
+	    packet_write_fmt_gently(fd_in, "pathname=%s\n", path) ||
+	    packet_flush_gently(fd_in))
+		goto comm_error;
+
+	/* Send old file content */
+	if (send_file_content(fd_in, old_buf, old_size))
+		goto comm_error;
+
+	/* Send new file content */
+	if (send_file_content(fd_in, new_buf, new_size))
+		goto comm_error;
+
+	/* Read hunks until flush packet */
+	while ((len = packet_read_line_gently(fd_out, NULL, &line)) >= 0 &&
+	       line) {
+		if (parse_hunk_line(line, &hunk) < 0)
+			goto comm_error;
+		ALLOC_GROW(hunks, nr_hunks + 1, alloc_hunks);
+		hunks[nr_hunks++] = hunk;
+	}
+	if (len < 0)
+		goto comm_error;
+
+	/* Read status */
+	if (subprocess_read_status(fd_out, &status))
+		goto comm_error;
+
+	if (!strcmp(status.buf, "success")) {
+		*hunks_out = hunks;
+		*nr_hunks_out = nr_hunks;
+		strbuf_release(&status);
+		sigchain_pop(SIGPIPE);
+		return DIFF_PROCESS_OK;
+	}
+
+	if (!strcmp(status.buf, "abort")) {
+		/*
+		 * The tool voluntarily withdrew: stop sending requests
+		 * but do not warn (this is not a failure).
+		 */
+		backend->supported_capabilities &= ~CAP_HUNKS;
+		free(hunks);
+		strbuf_release(&status);
+		sigchain_pop(SIGPIPE);
+		return DIFF_PROCESS_SKIP;
+	}
+
+	/* status=error or unknown status */
+	free(hunks);
+	strbuf_release(&status);
+	sigchain_pop(SIGPIPE);
+	return DIFF_PROCESS_ERROR;
+
+comm_error:
+	/*
+	 * Communication failure (broken pipe, malformed response).
+	 * Tear down the process and mark as failed so we do not
+	 * retry on every subsequent file.
+	 */
+	drv->diff_process_failed = 1;
+	drv->diff_subprocess = NULL;
+	subprocess_stop_command(&backend->subprocess);
+	free(backend);
+	free(hunks);
+	strbuf_release(&status);
+	sigchain_pop(SIGPIPE);
+	return DIFF_PROCESS_ERROR;
+}
+
+enum diff_process_result diff_process_fill_hunks(
+		struct diff_options *diffopt,
+		const char *path,
+		const mmfile_t *file_a,
+		const mmfile_t *file_b,
+		xpparam_t *xpp)
+{
+	struct userdiff_driver *drv;
+	struct xdl_hunk *ext_hunks = NULL;
+	size_t nr = 0;
+	enum diff_process_result res;
+
+	if (!diffopt || !path)
+		return DIFF_PROCESS_SKIP;
+	if (diffopt->flags.no_diff_process || diffopt->ignore_driver_algorithm)
+		return DIFF_PROCESS_SKIP;
+
+	drv = userdiff_find_by_path(diffopt->repo->index, path);
+	if (!drv || !drv->process)
+		return DIFF_PROCESS_SKIP;
+	if (drv->diff_process_failed)
+		return DIFF_PROCESS_SKIP;
+
+	res = get_hunks(drv, path,
+			file_a->ptr, file_a->size,
+			file_b->ptr, file_b->size,
+			&ext_hunks, &nr);
+	if (res == DIFF_PROCESS_OK) {
+		if (!nr) {
+			free(ext_hunks);
+			return DIFF_PROCESS_EQUIVALENT;
+		}
+		xpp->external_hunks = ext_hunks;
+		xpp->external_hunks_nr = nr;
+		return DIFF_PROCESS_OK;
+	}
+	if (res == DIFF_PROCESS_ERROR) {
+		warning(_("diff process '%s' failed for '%s',"
+			  " falling back to builtin diff"),
+			drv->process, path);
+		return DIFF_PROCESS_ERROR;
+	}
+	return DIFF_PROCESS_SKIP;
+}
diff --git a/diff-process.h b/diff-process.h
new file mode 100644
index 0000000000..d34b42f811
--- /dev/null
+++ b/diff-process.h
@@ -0,0 +1,39 @@
+#ifndef DIFF_PROCESS_H
+#define DIFF_PROCESS_H
+
+#include "xdiff/xdiff.h"
+
+struct diff_options;
+
+enum diff_process_result {
+	DIFF_PROCESS_ERROR = -1, /* tool failure: warned, fell back */
+	DIFF_PROCESS_OK = 0,     /* hunks populated in xpp */
+	DIFF_PROCESS_SKIP,       /* no process configured: use builtin */
+	DIFF_PROCESS_EQUIVALENT, /* tool says files are equivalent */
+};
+
+/*
+ * Consult the diff process configured for 'path' and populate
+ * xpp->external_hunks with the returned hunks.
+ *
+ * Handles driver lookup, flag checks (--no-ext-diff,
+ * --diff-algorithm), subprocess management, and error reporting.
+ *
+ * Returns DIFF_PROCESS_OK when hunks are populated in xpp.
+ * The caller owns xpp->external_hunks and must free() it.
+ *
+ * Returns DIFF_PROCESS_EQUIVALENT when the tool returns no hunks
+ * (files are considered identical); caller should skip diff/blame.
+ * Returns DIFF_PROCESS_SKIP when no process applies; caller
+ * should use the builtin diff algorithm.
+ * Returns DIFF_PROCESS_ERROR on tool failure (already warned);
+ * caller should fall back to the builtin diff algorithm.
+ */
+enum diff_process_result diff_process_fill_hunks(
+		struct diff_options *diffopt,
+		const char *path,
+		const mmfile_t *file_a,
+		const mmfile_t *file_b,
+		xpparam_t *xpp);
+
+#endif /* DIFF_PROCESS_H */
diff --git a/diff.c b/diff.c
index 397e38b41c..2d5ed6ea8c 100644
--- a/diff.c
+++ b/diff.c
@@ -25,6 +25,7 @@
 #include "utf8.h"
 #include "odb.h"
 #include "userdiff.h"
+#include "diff-process.h"
 #include "submodule.h"
 #include "hashmap.h"
 #include "mem-pool.h"
@@ -4031,6 +4032,17 @@ static void builtin_diff(const char *name_a,
 		xpp.ignore_regex_nr = o->ignore_regex_nr;
 		xpp.anchors = o->anchors;
 		xpp.anchors_nr = o->anchors_nr;
+
+		if (diff_process_fill_hunks(o, name_a,
+					    &mf1, &mf2, &xpp)
+		    == DIFF_PROCESS_EQUIVALENT) {
+			if (textconv_one)
+				free(mf1.ptr);
+			if (textconv_two)
+				free(mf2.ptr);
+			goto free_ab_and_return;
+		}
+
 		xecfg.ctxlen = o->context;
 		xecfg.interhunkctxlen = o->interhunkcontext;
 		xecfg.flags = XDL_EMIT_FUNCNAMES;
@@ -4111,6 +4123,7 @@ static void builtin_diff(const char *name_a,
 		} else if (xdi_diff_outf(&mf1, &mf2, NULL, fn_out_consume,
 					 &ecbdata, &xpp, &xecfg))
 			die("unable to generate diff for %s", one->path);
+		free(xpp.external_hunks);
 		if (o->word_diff)
 			free_diff_words_data(&ecbdata);
 		if (textconv_one)
diff --git a/diff.h b/diff.h
index 7eb84aadf4..d1e5a13e9e 100644
--- a/diff.h
+++ b/diff.h
@@ -173,6 +173,9 @@ struct diff_flags {
 	 */
 	unsigned allow_external;
 
+	/** Disables diff.<driver>.process. */
+	unsigned no_diff_process;
+
 	/**
 	 * For communication between the calling program and the options parser;
 	 * tell the calling program to signal the presence of difference using
diff --git a/meson.build b/meson.build
index 11488623bf..8a7370b38f 100644
--- a/meson.build
+++ b/meson.build
@@ -328,6 +328,7 @@ libgit_sources = [
   'diff-merges.c',
   'diff-lib.c',
   'diff-no-index.c',
+  'diff-process.c',
   'diff.c',
   'diffcore-break.c',
   'diffcore-delta.c',
diff --git a/t/.gitattributes b/t/.gitattributes
index 7664c6e027..de97920cab 100644
--- a/t/.gitattributes
+++ b/t/.gitattributes
@@ -23,3 +23,4 @@ t[0-9][0-9][0-9][0-9]/* -whitespace
 /t8005/*.txt eol=lf
 /t9*/*.dump eol=lf
 /t0040*.sh whitespace=-indent-with-non-tab
+/t4080-diff-process.sh whitespace=-indent-with-non-tab
diff --git a/t/meson.build b/t/meson.build
index 7528e5cda5..f67208d7ee 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -510,6 +510,7 @@ integration_tests = [
   't4072-diff-max-depth.sh',
   't4073-diff-stat-name-width.sh',
   't4074-diff-shifted-matched-group.sh',
+  't4080-diff-process.sh',
   't4100-apply-stat.sh',
   't4101-apply-nonl.sh',
   't4102-apply-rename.sh',
diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
new file mode 100755
index 0000000000..f159cd86d8
--- /dev/null
+++ b/t/t4080-diff-process.sh
@@ -0,0 +1,538 @@
+#!/bin/sh
+
+test_description='diff process via long-running process'
+
+. ./test-lib.sh
+
+if test_have_prereq PYTHON
+then
+	PYTHON_PATH=$(command -v python3) || PYTHON_PATH=$(command -v python)
+fi
+
+#
+# A single parametric diff process.
+# Usage: diff-process-backend --mode=<mode> [--log=<path>]
+#
+# Modes:
+#   whole-file  - report all lines as changed (default)
+#   fixed-hunk  - always report hunk 5 2 5 2
+#   bad-hunk    - report out-of-bounds hunk 999 1 999 1
+#   bad-sync    - report hunk with mismatched unchanged totals
+#   overlap     - report two overlapping hunks
+#   no-hunks   - return no hunks (files considered equivalent)
+#   error       - return status=error for every request
+#   abort       - return status=abort for every request
+#   crash       - read one request then exit without responding
+#
+setup_backend () {
+	cat >"$TRASH_DIRECTORY/diff-process-backend.py" <<-\PYEOF
+	import sys, os
+
+	def read_pkt():
+	    hdr = sys.stdin.buffer.read(4)
+	    if len(hdr) < 4: return None
+	    length = int(hdr, 16)
+	    if length == 0: return ""
+	    data = sys.stdin.buffer.read(length - 4)
+	    return data.decode().rstrip("\n")
+
+	def write_pkt(line):
+	    data = (line + "\n").encode()
+	    sys.stdout.buffer.write(f"{len(data)+4:04x}".encode() + data)
+	    sys.stdout.buffer.flush()
+
+	def write_flush():
+	    sys.stdout.buffer.write(b"0000")
+	    sys.stdout.buffer.flush()
+
+	def read_content():
+	    chunks = []
+	    while True:
+	        hdr = sys.stdin.buffer.read(4)
+	        if len(hdr) < 4: break
+	        length = int(hdr, 16)
+	        if length == 0: break
+	        chunks.append(sys.stdin.buffer.read(length - 4))
+	    return b"".join(chunks)
+
+	mode = "whole-file"
+	logfile = None
+	for arg in sys.argv[1:]:
+	    if arg.startswith("--mode="):
+	        mode = arg[7:]
+	    elif arg.startswith("--log="):
+	        logfile = open(arg[6:], "a")
+
+	def log(msg):
+	    if logfile:
+	        logfile.write(msg + "\n")
+	        logfile.flush()
+
+	# Handshake
+	assert read_pkt() == "git-diff-client"
+	assert read_pkt() == "version=1"
+	read_pkt()
+	write_pkt("git-diff-server")
+	write_pkt("version=1")
+	write_flush()
+	while True:
+	    p = read_pkt()
+	    if p == "": break
+	write_pkt("capability=hunks")
+	write_flush()
+
+	log("ready")
+
+	while True:
+	    cmd = None
+	    pathname = None
+	    while True:
+	        p = read_pkt()
+	        if p is None: sys.exit(0)
+	        if p == "": break
+	        if p.startswith("command="): cmd = p.split("=",1)[1]
+	        if p.startswith("pathname="): pathname = p.split("=",1)[1]
+	    if cmd is None: sys.exit(0)
+	    old = read_content()
+	    new = read_content()
+	    old_first = old.split(b"\n")[0].decode(errors="replace") if old else ""
+	    new_first = new.split(b"\n")[0].decode(errors="replace") if new else ""
+	    log(f"command={cmd} pathname={pathname} old={old_first} new={new_first}")
+
+	    if mode == "error":
+	        write_flush()
+	        write_pkt("status=error")
+	        write_flush()
+	        continue
+
+	    if mode == "abort":
+	        write_flush()
+	        write_pkt("status=abort")
+	        write_flush()
+	        continue
+
+	    if mode == "crash":
+	        sys.exit(1)
+
+	    if cmd == "hunks":
+	        if mode == "fixed-hunk":
+	            write_pkt("hunk 5 2 5 2")
+	        elif mode == "bad-hunk":
+	            write_pkt("hunk 999 1 999 1")
+	        elif mode == "bad-sync":
+	            write_pkt("hunk 1 2 1 1")
+	        elif mode == "overlap":
+	            write_pkt("hunk 1 5 1 5")
+	            write_pkt("hunk 3 2 3 2")
+	        elif mode == "no-hunks":
+	            pass
+	        else:
+	            ol = old.count(b"\n")
+	            nl = new.count(b"\n")
+	            write_pkt(f"hunk 1 {ol} 1 {nl}")
+	        write_flush()
+	        write_pkt("status=success")
+	        write_flush()
+	    else:
+	        write_flush()
+	        write_pkt("status=error")
+	        write_flush()
+	PYEOF
+	write_script diff-process-backend <<-SHEOF
+	exec "$PYTHON_PATH" "$TRASH_DIRECTORY/diff-process-backend.py" "\$@"
+	SHEOF
+}
+
+BACKEND="./diff-process-backend"
+
+test_expect_success PYTHON 'setup' '
+	setup_backend &&
+	echo "*.c diff=cdiff" >.gitattributes &&
+	git add .gitattributes &&
+
+	# boundary.c: 10 lines, changes at 5-6 and 9-10.
+	# Used by: hunk boundaries, error fallback, crash, bad hunks, overlap.
+	cat >boundary.c <<-\EOF &&
+	line1
+	line2
+	line3
+	line4
+	OLD5
+	OLD6
+	line7
+	line8
+	OLD9
+	OLD10
+	EOF
+	git add boundary.c &&
+
+	# worddiff.c: single-line function, value changes 1 -> 999.
+	# Used by: word-diff, --diff-algorithm, --no-ext-diff, --stat.
+	cat >worddiff.c <<-\EOF &&
+	int value(void) { return 1; }
+	EOF
+	git add worddiff.c &&
+
+	# newfile.c: single-line function, value changes 42 -> 99.
+	# Used by: new file, --exit-code, multiple drivers.
+	cat >newfile.c <<-\EOF &&
+	int new_func(void) { return 42; }
+	EOF
+	git add newfile.c &&
+
+	# logtest.c: single-line function for log/format-patch tests.
+	# Needs two commits so log -1 has a diff.
+	cat >logtest.c <<-\EOF &&
+	int logfunc(void) { return 1; }
+	EOF
+	git add logtest.c &&
+
+	# two.c/one.c: two-file pair for error/abort/startup-failure tests.
+	cat >one.c <<-\EOF &&
+	int first(void) { return 1; }
+	EOF
+	cat >two.c <<-\EOF &&
+	int second(void) { return 2; }
+	EOF
+	git add one.c two.c &&
+
+	git commit -m "initial" &&
+
+	# Second commit for logtest.c (so log -1 has something to show).
+	cat >logtest.c <<-\EOF &&
+	int logfunc(void) { return 2; }
+	EOF
+	git add logtest.c &&
+	git commit -m "change logtest.c" &&
+
+	# Working tree modifications (not committed).
+	cat >boundary.c <<-\EOF &&
+	line1
+	line2
+	line3
+	line4
+	NEW5
+	NEW6
+	line7
+	line8
+	NEW9
+	NEW10
+	EOF
+
+	cat >worddiff.c <<-\EOF &&
+	int value(void) { return 999; }
+	EOF
+
+	cat >newfile.c <<-\EOF &&
+	int new_func(void) { return 99; }
+	EOF
+
+	cat >one.c <<-\EOF &&
+	int first(void) { return 10; }
+	EOF
+
+	cat >two.c <<-\EOF
+	int second(void) { return 20; }
+	EOF
+'
+
+#
+# Core behavior: the tool controls which lines are marked as changed.
+#
+
+test_expect_success PYTHON 'diff process hunk boundaries affect output' '
+	# The file has changes at lines 5-6 and 9-10, but fixed-hunk
+	# only reports lines 5-6 as changed.  Lines 9-10 should not
+	# appear as changed in the output.
+	git -c diff.cdiff.process="$BACKEND --mode=fixed-hunk" \
+		diff boundary.c >actual &&
+	test_grep "^-OLD5" actual &&
+	test_grep "^-OLD6" actual &&
+	test_grep "^+NEW5" actual &&
+	test_grep "^+NEW6" actual &&
+	test_grep ! "^-OLD9" actual &&
+	test_grep ! "^-OLD10" actual &&
+	test_grep ! "^+NEW9" actual &&
+	test_grep ! "^+NEW10" actual
+'
+
+test_expect_success PYTHON 'diff process works with new file' '
+	rm -f backend.log &&
+	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
+		diff -- newfile.c >actual 2>stderr &&
+	test_grep "return 99" actual &&
+	test_grep "pathname=newfile.c" backend.log &&
+	test_must_be_empty stderr
+'
+
+test_expect_success PYTHON 'diff process works with added file (empty old side)' '
+	cat >added.c <<-\EOF &&
+	int added(void) { return 1; }
+	EOF
+	git add added.c &&
+
+	rm -f backend.log &&
+	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
+		diff --cached -- added.c >actual 2>stderr &&
+	test_grep "added" actual &&
+	test_grep "pathname=added.c" backend.log &&
+	test_must_be_empty stderr
+'
+
+test_expect_success PYTHON 'diff process skipped for binary files' '
+	printf "\\0binary" >binary.c &&
+	git add binary.c &&
+	git commit -m "add binary" &&
+	printf "\\0changed" >binary.c &&
+
+	rm -f backend.log &&
+	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
+		diff -- binary.c >actual &&
+	test_grep "Binary files" actual &&
+	test_path_is_missing backend.log
+'
+
+test_expect_success PYTHON 'diff process not consulted for unmatched driver' '
+	echo "not tracked by cdiff" >unmatched.txt &&
+	git add unmatched.txt &&
+	git commit -m "add unmatched.txt" &&
+
+	echo "modified" >unmatched.txt &&
+
+	rm -f backend.log &&
+	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
+		diff -- unmatched.txt >actual &&
+	test_grep "modified" actual &&
+	test_path_is_missing backend.log
+'
+
+test_expect_success PYTHON 'multiple drivers use separate processes' '
+	echo "*.h diff=hdiff" >>.gitattributes &&
+	git add .gitattributes &&
+
+	cat >multi.h <<-\EOF &&
+	int header(void) { return 1; }
+	EOF
+	git add multi.h &&
+	git commit -m "add multi.h" &&
+
+	cat >multi.h <<-\EOF &&
+	int header(void) { return 2; }
+	EOF
+
+	rm -f backend-c.log backend-h.log &&
+	git -c diff.cdiff.process="$BACKEND --log=backend-c.log" \
+	    -c diff.hdiff.process="$BACKEND --log=backend-h.log" \
+		diff -- newfile.c multi.h >actual 2>stderr &&
+	test_grep "pathname=newfile.c" backend-c.log &&
+	test_grep "pathname=multi.h" backend-h.log &&
+	test_must_be_empty stderr
+'
+
+test_expect_success PYTHON 'diff process works alongside textconv' '
+	write_script uppercase-filter <<-\EOF &&
+	tr "a-z" "A-Z" <"$1"
+	EOF
+
+	cat >textconv.c <<-\EOF &&
+	hello world
+	EOF
+	git add textconv.c &&
+	git commit -m "add textconv.c" &&
+
+	cat >textconv.c <<-\EOF &&
+	goodbye world
+	EOF
+
+	rm -f backend.log &&
+	git -c diff.cdiff.textconv="./uppercase-filter" \
+	    -c diff.cdiff.process="$BACKEND --log=backend.log" \
+		diff -- textconv.c >actual 2>stderr &&
+	# The diff process receives textconv-transformed (uppercase) content.
+	test_grep "pathname=textconv.c" backend.log &&
+	test_grep "old=HELLO WORLD" backend.log &&
+	test_grep "new=GOODBYE WORLD" backend.log &&
+	test_must_be_empty stderr
+'
+
+#
+# Downstream features: word diff, log, equivalent files, exit code.
+#
+
+test_expect_success PYTHON 'diff process with --word-diff' '
+	rm -f backend.log &&
+	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
+		diff --word-diff worddiff.c >actual 2>stderr &&
+	test_grep "\[-1;-\]" actual &&
+	test_grep "{+999;+}" actual &&
+	test_grep "pathname=worddiff.c" backend.log &&
+	test_must_be_empty stderr
+'
+
+test_expect_success PYTHON 'diff process works with git log -p' '
+	# With no-hunks mode, the tool says the files are equivalent,
+	# so log -p should show the commit but no diff content.
+	rm -f backend.log &&
+	git -c diff.cdiff.process="$BACKEND --mode=no-hunks --log=backend.log" \
+		log -1 -p -- logtest.c >actual 2>stderr &&
+	test_grep "change logtest.c" actual &&
+	test_grep ! "return 2" actual &&
+	test_grep "command=hunks pathname=logtest.c" backend.log &&
+	test_must_be_empty stderr
+'
+
+test_expect_success PYTHON 'diff process no hunks suppresses diff output' '
+	cat >nohunks.c <<-\EOF &&
+	int zero(void) { return 0; }
+	EOF
+	git add nohunks.c &&
+	git commit -m "add nohunks.c" &&
+
+	cat >nohunks.c <<-\EOF &&
+	int zero(void) { return 999; }
+	EOF
+
+	git -c diff.cdiff.process="$BACKEND --mode=no-hunks" \
+		diff nohunks.c >actual &&
+	test_must_be_empty actual
+'
+
+test_expect_success PYTHON 'diff process no hunks with --exit-code returns success' '
+	git -c diff.cdiff.process="$BACKEND --mode=no-hunks" \
+		diff --exit-code nohunks.c
+'
+
+test_expect_success PYTHON 'diff process with --exit-code and hunks returns failure' '
+	test_expect_code 1 git -c diff.cdiff.process="$BACKEND" \
+		diff --exit-code newfile.c
+'
+
+#
+# Bypass mechanisms: flags and commands that skip the diff process.
+#
+
+test_expect_success PYTHON 'diff process bypassed by --diff-algorithm' '
+	rm -f backend.log &&
+	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
+		diff --diff-algorithm=patience worddiff.c >actual &&
+	test_grep "return 999" actual &&
+	test_path_is_missing backend.log
+'
+
+test_expect_success PYTHON 'diff process not used by --stat' '
+	rm -f backend.log &&
+	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
+		diff --stat worddiff.c >actual &&
+	test_grep "worddiff.c" actual &&
+	test_path_is_missing backend.log
+'
+
+#
+# Error handling and fallback.
+#
+
+test_expect_success PYTHON 'diff process fallback on tool error status' '
+	rm -f backend.log &&
+	git -c diff.cdiff.process="$BACKEND --mode=error --log=backend.log" \
+		diff boundary.c >actual 2>stderr &&
+	# Fallback produces the full builtin diff (both change regions).
+	test_grep "^-OLD5" actual &&
+	test_grep "^+NEW5" actual &&
+	test_grep "^-OLD9" actual &&
+	test_grep "^+NEW9" actual &&
+	# Tool was contacted (it replied with error, not crash).
+	test_grep "command=hunks pathname=boundary.c" backend.log &&
+	test_grep "diff process.*failed" stderr
+'
+
+test_expect_success PYTHON 'diff process error keeps tool available for next file' '
+	rm -f backend.log &&
+	git -c diff.cdiff.process="$BACKEND --mode=error --log=backend.log" \
+		diff -- one.c two.c >actual 2>stderr &&
+	# Unlike abort, error keeps the tool available: both files
+	# are sent to the tool (and both fall back).
+	test_grep "pathname=one.c" backend.log &&
+	test_grep "pathname=two.c" backend.log &&
+	test_grep "return 10" actual &&
+	test_grep "return 20" actual
+'
+
+test_expect_success PYTHON 'diff process abort disables for session' '
+	rm -f backend.log &&
+	git -c diff.cdiff.process="$BACKEND --mode=abort --log=backend.log" \
+		diff -- one.c two.c >actual &&
+	# Both files should still produce diff output via fallback.
+	test_grep "return 10" actual &&
+	test_grep "return 20" actual &&
+	# The tool aborts on the first file and git clears its
+	# capability.  The second file never contacts the tool.
+	test_grep "pathname=one.c" backend.log &&
+	test_grep ! "pathname=two.c" backend.log
+'
+
+test_expect_success PYTHON 'diff process fallback on tool crash' '
+	git -c diff.cdiff.process="$BACKEND --mode=crash" \
+		diff boundary.c >actual 2>stderr &&
+	test_grep "^-OLD5" actual &&
+	test_grep "^+NEW5" actual &&
+	test_grep "^-OLD9" actual &&
+	test_grep "^+NEW9" actual &&
+	# Crash is a communication failure, so a warning is emitted.
+	test_grep "diff process.*failed" stderr
+'
+
+test_expect_success PYTHON 'diff process startup failure only warns once' '
+	git -c diff.cdiff.process="/nonexistent/tool" \
+		diff -- one.c two.c >actual 2>stderr &&
+	# Both files produce diff output via fallback.
+	test_grep "return 10" actual &&
+	test_grep "return 20" actual &&
+	# Sentinel prevents repeated warnings: only one, not one per file.
+	test_grep "diff process.*failed" stderr >warnings &&
+	test_line_count = 1 warnings
+'
+
+test_expect_success PYTHON 'diff process fallback on bad hunks' '
+	git -c diff.cdiff.process="$BACKEND --mode=bad-hunk" \
+		diff boundary.c >actual 2>stderr &&
+	test_grep "^-OLD5" actual &&
+	test_grep "^+NEW5" actual &&
+	test_grep "^-OLD9" actual &&
+	test_grep "^+NEW9" actual &&
+	# Invalid hunks are caught by xdiff validation, not the
+	# protocol layer, so no warning is emitted.
+	test_must_be_empty stderr
+'
+
+test_expect_success PYTHON 'diff process fallback on mismatched unchanged totals' '
+	cat >synctest.c <<-\EOF &&
+	line1
+	line2
+	line3
+	EOF
+	git add synctest.c &&
+	git commit -m "add synctest.c" &&
+
+	cat >synctest.c <<-\EOF &&
+	line1
+	changed
+	line3
+	EOF
+
+	# bad-sync reports hunk 1 2 1 1: marks 2 old lines and 1 new
+	# line as changed, leaving 1 unchanged old vs 2 unchanged new.
+	# The synchronization invariant fails and git falls back.
+	git -c diff.cdiff.process="$BACKEND --mode=bad-sync" \
+		diff synctest.c >actual 2>stderr &&
+	test_grep "changed" actual
+'
+
+test_expect_success PYTHON 'diff process fallback on overlapping hunks' '
+	# boundary.c has 10 lines, so both hunks are in bounds
+	# but they overlap at lines 3-5, triggering the ordering check.
+	git -c diff.cdiff.process="$BACKEND --mode=overlap" \
+		diff boundary.c >actual 2>stderr &&
+	test_grep "NEW5" actual
+'
+
+test_done
diff --git a/userdiff.h b/userdiff.h
index 51c26e0d41..a98eabe377 100644
--- a/userdiff.h
+++ b/userdiff.h
@@ -3,6 +3,7 @@
 
 #include "notes-cache.h"
 
+struct diff_subprocess;
 struct index_state;
 struct repository;
 
@@ -33,6 +34,8 @@ struct userdiff_driver {
 	int textconv_want_cache;
 	const char *process;
 	char *process_owned;
+	struct diff_subprocess *diff_subprocess;
+	unsigned diff_process_failed : 1;
 };
 enum userdiff_driver_type {
 	USERDIFF_DRIVER_TYPE_BUILTIN = 1<<0,
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v3 3/6] sub-process: separate process lifecycle from hashmap management
From: Michael Montalbo via GitGitGadget @ 2026-05-29 20:48 UTC (permalink / raw)
  To: git; +Cc: Michael Montalbo, Michael Montalbo
In-Reply-To: <pull.2120.v3.git.1780087700.gitgitgadget@gmail.com>

From: Michael Montalbo <mmontalbo@gmail.com>

subprocess_start() and subprocess_stop() couple two concerns:
managing a child process (setup, handshake, teardown) and
managing a hashmap that indexes running processes by command
string.  The hashmap suits callers like convert.c where many
files may share one filter process looked up by name, but
callers that manage process lifetime through their own data
structures do not need it.

Extract subprocess_start_command() and subprocess_stop_command()
so callers can reuse the child process setup and handshake
machinery without maintaining a hashmap.  subprocess_start()
and subprocess_stop() become thin wrappers that add hashmap
operations on top.

No functional change for existing callers.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
---
 sub-process.c | 29 ++++++++++++++++++++++++-----
 sub-process.h |  9 ++++++++-
 2 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/sub-process.c b/sub-process.c
index 83bf0a0e82..33b0bbc831 100644
--- a/sub-process.c
+++ b/sub-process.c
@@ -49,7 +49,7 @@ int subprocess_read_status(int fd, struct strbuf *status)
 	return (len < 0) ? len : 0;
 }
 
-void subprocess_stop(struct hashmap *hashmap, struct subprocess_entry *entry)
+void subprocess_stop_command(struct subprocess_entry *entry)
 {
 	if (!entry)
 		return;
@@ -57,7 +57,14 @@ void subprocess_stop(struct hashmap *hashmap, struct subprocess_entry *entry)
 	entry->process.clean_on_exit = 0;
 	kill(entry->process.pid, SIGTERM);
 	finish_command(&entry->process);
+}
+
+void subprocess_stop(struct hashmap *hashmap, struct subprocess_entry *entry)
+{
+	if (!entry)
+		return;
 
+	subprocess_stop_command(entry);
 	hashmap_remove(hashmap, &entry->ent, NULL);
 }
 
@@ -72,7 +79,7 @@ static void subprocess_exit_handler(struct child_process *process)
 	finish_command(process);
 }
 
-int subprocess_start(struct hashmap *hashmap, struct subprocess_entry *entry, const char *cmd,
+int subprocess_start_command(struct subprocess_entry *entry, const char *cmd,
 	subprocess_start_fn startfn)
 {
 	int err;
@@ -96,15 +103,27 @@ int subprocess_start(struct hashmap *hashmap, struct subprocess_entry *entry, co
 		return err;
 	}
 
-	hashmap_entry_init(&entry->ent, strhash(cmd));
-
 	err = startfn(entry);
 	if (err) {
 		error("initialization for subprocess '%s' failed", cmd);
-		subprocess_stop(hashmap, entry);
+		subprocess_stop_command(entry);
 		return err;
 	}
 
+	return 0;
+}
+
+int subprocess_start(struct hashmap *hashmap, struct subprocess_entry *entry, const char *cmd,
+	subprocess_start_fn startfn)
+{
+	int err;
+
+	err = subprocess_start_command(entry, cmd, startfn);
+	if (err) {
+		return err;
+	}
+
+	hashmap_entry_init(&entry->ent, strhash(cmd));
 	hashmap_add(hashmap, &entry->ent);
 	return 0;
 }
diff --git a/sub-process.h b/sub-process.h
index bfc3959a1b..45f1b8e5e3 100644
--- a/sub-process.h
+++ b/sub-process.h
@@ -52,10 +52,17 @@ int cmd2process_cmp(const void *unused_cmp_data,
  */
 typedef int(*subprocess_start_fn)(struct subprocess_entry *entry);
 
-/* Start a subprocess and add it to the subprocess hashmap. */
+/* Start a subprocess and run the startfn (typically handshake). */
+int subprocess_start_command(struct subprocess_entry *entry, const char *cmd,
+		subprocess_start_fn startfn);
+
+/* Start a subprocess, run startfn, and add it to the subprocess hashmap. */
 int subprocess_start(struct hashmap *hashmap, struct subprocess_entry *entry, const char *cmd,
 		subprocess_start_fn startfn);
 
+/* Kill a subprocess. */
+void subprocess_stop_command(struct subprocess_entry *entry);
+
 /* Kill a subprocess and remove it from the subprocess hashmap. */
 void subprocess_stop(struct hashmap *hashmap, struct subprocess_entry *entry);
 
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v3 2/6] userdiff: add diff.<driver>.process config
From: Michael Montalbo via GitGitGadget @ 2026-05-29 20:48 UTC (permalink / raw)
  To: git; +Cc: Michael Montalbo, Michael Montalbo
In-Reply-To: <pull.2120.v3.git.1780087700.gitgitgadget@gmail.com>

From: Michael Montalbo <mmontalbo@gmail.com>

Add the process field to struct userdiff_driver and teach the
config parser to populate it from diff.<driver>.process.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
---
 userdiff.c | 7 +++++++
 userdiff.h | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/userdiff.c b/userdiff.c
index fe710a68bf..81c0bebcce 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -499,6 +499,13 @@ int userdiff_config(const char *k, const char *v)
 		drv->algorithm = drv->algorithm_owned;
 		return ret;
 	}
+	if (!strcmp(type, "process")) {
+		int ret;
+		FREE_AND_NULL(drv->process_owned);
+		ret = git_config_string(&drv->process_owned, k, v);
+		drv->process = drv->process_owned;
+		return ret;
+	}
 
 	return 0;
 }
diff --git a/userdiff.h b/userdiff.h
index 827361b0bc..51c26e0d41 100644
--- a/userdiff.h
+++ b/userdiff.h
@@ -31,6 +31,8 @@ struct userdiff_driver {
 	char *textconv_owned;
 	struct notes_cache *textconv_cache;
 	int textconv_want_cache;
+	const char *process;
+	char *process_owned;
 };
 enum userdiff_driver_type {
 	USERDIFF_DRIVER_TYPE_BUILTIN = 1<<0,
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v3 0/6] [RFC] diff: add diff.<driver>.process for external hunk providers
From: Michael Montalbo via GitGitGadget @ 2026-05-29 20:48 UTC (permalink / raw)
  To: git; +Cc: Michael Montalbo
In-Reply-To: <pull.2120.v2.git.1779733799.gitgitgadget@gmail.com>

Language-aware diff tools (e.g., Difftastic) and format-specific analyzers
can produce better line matching than Git's builtin diff algorithm, but
diff.<driver>.command replaces Git's output entirely, losing downstream
features like word diff, function context, color, and blame.

This series adds diff.<driver>.process, a long-running subprocess protocol
that lets an external tool control which lines Git considers changed while
Git handles all output formatting. The protocol follows
filter.<driver>.process: pkt-line over stdin/stdout, capability negotiation,
one process per Git invocation.

The tool receives both file versions and returns changed regions (line
ranges in the old and new file). Git validates and feeds them into the xdiff
pipeline in place of the builtin diff algorithm. When the tool returns no
hunks, Git treats the files as having no changes.

The first two patches add xdiff plumbing for externally supplied hunks and
the diff.<driver>.process config key. Patch 3 refactors the subprocess API
to separate process lifecycle from hashmap management, since the diff
process stores its subprocess on the userdiff driver rather than in a
hashmap. The main feature lands in patch 4. Patch 5 wires up bypass knobs
(--no-ext-diff, format-patch). Patch 6 integrates with blame so the tool can
declare commits as having no changes.

Changes since v2:

 * Extracted subprocess_start_command() from subprocess_start() so the diff
   process can reuse the startup machinery without a hashmap (new patch 3).
 * Subprocess stored on userdiff_driver, no global variables.
 * Differentiated error handling: tool failure warns (with tool name), "not
   configured" is silent, abort is voluntary.
 * Single public entry point: diff_process_fill_hunks() handles driver
   lookup, flag checks, subprocess management, and error reporting for both
   diff.c and blame.c.
 * Rewrote gitattributes protocol section to match filter process convention
   with packet diagrams and fully specified hunk format.
 * Split bypass knobs (--no-ext-diff, format-patch) into a separate commit.
 * SIGPIPE protection, stricter input validation, const correctness.
 * Changed "zero hunks" to "no hunks" throughout.

Michael Montalbo (6):
  xdiff: support external hunks via xpparam_t
  userdiff: add diff.<driver>.process config
  sub-process: separate process lifecycle from hashmap management
  diff: add long-running diff process via diff.<driver>.process
  diff: bypass diff process with --no-ext-diff and in format-patch
  blame: consult diff process for no-hunk detection

 Documentation/config/diff.adoc           |   5 +
 Documentation/diff-algorithm-option.adoc |   3 +
 Documentation/diff-options.adoc          |   4 +-
 Documentation/gitattributes.adoc         | 139 +++++
 Makefile                                 |   1 +
 blame.c                                  |  40 +-
 builtin/log.c                            |   7 +
 diff-process.c                           | 288 ++++++++++
 diff-process.h                           |  39 ++
 diff.c                                   |  29 +-
 diff.h                                   |   5 +
 meson.build                              |   1 +
 sub-process.c                            |  29 +-
 sub-process.h                            |   9 +-
 t/.gitattributes                         |   1 +
 t/meson.build                            |   1 +
 t/t4080-diff-process.sh                  | 660 +++++++++++++++++++++++
 userdiff.c                               |   7 +
 userdiff.h                               |   5 +
 xdiff-interface.c                        |   7 +-
 xdiff/xdiff.h                            |  13 +
 xdiff/xdiffi.c                           |  85 ++-
 xdiff/xprepare.c                         |  10 +
 xdiff/xprepare.h                         |   1 +
 24 files changed, 1368 insertions(+), 21 deletions(-)
 create mode 100644 diff-process.c
 create mode 100644 diff-process.h
 create mode 100755 t/t4080-diff-process.sh


base-commit: 94f057755b7941b321fd11fec1b2e3ca5313a4e0
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2120%2Fmmontalbo%2Fmm%2Fstructural-diff-backend-clean-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2120/mmontalbo/mm/structural-diff-backend-clean-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/2120

Range-diff vs v2:

 1:  f887a7e2ba ! 1:  13eb201d63 xdiff: support external hunks via xpparam_t
     @@ Commit message
          numbers, overlapping or out-of-order hunks, negative counts, and
          violations of the synchronization invariant (unchanged line counts
          must match between files).  On validation failure, fall back to
     -    the builtin diff algorithm.
     +    the builtin diff algorithm; this re-runs xdl_prepare_env() since
     +    the first call may have dirtied the changed[] arrays.
      
          Skip trim_common_tail() in xdi_diff() when external hunks are
          present, since external hunks reference line numbers in the
     @@ xdiff/xdiff.h: typedef struct s_xpparam {
       	char **anchors;
       	size_t anchors_nr;
      +
     -+	/* Externally computed hunks: bypass the diff algorithm. */
     -+	const struct xdl_hunk *external_hunks;
     ++	/* Externally computed hunks: bypass the diff algorithm.  Owned by caller. */
     ++	struct xdl_hunk *external_hunks;
      +	size_t external_hunks_nr;
       } xpparam_t;
       
     @@ xdiff/xdiffi.c: static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdf
      + * Returns 0 on success, -1 on validation failure.
      + */
      +static int xdl_populate_hunks_from_external(xdfenv_t *xe,
     -+					    const struct xdl_hunk *hunks,
     ++					    struct xdl_hunk *hunks,
      +					    size_t nr_hunks)
      +{
      +	size_t i;
      +	long j, prev_old_end = 0, prev_new_end = 0;
      +	long total_old = 0, total_new = 0;
      +
     ++	/*
     ++	 * xdl_prepare_env() may dirty changed[] via xdl_cleanup_records().
     ++	 * Clear them so only the external hunks are marked.
     ++	 */
      +	xdl_clear_changed(&xe->xdf1);
      +	xdl_clear_changed(&xe->xdf2);
      +
      +	for (i = 0; i < nr_hunks; i++) {
     -+		const struct xdl_hunk *h = &hunks[i];
     ++		struct xdl_hunk *h = &hunks[i];
      +
      +		if (h->old_count < 0 || h->new_count < 0)
      +			return -1;
     -+
     -+		/* Bounds check (1-based line numbers) */
     -+		if (h->old_count > 0 &&
     -+		    (h->old_start < 1 ||
     -+		     h->old_start + h->old_count - 1 > (long)xe->xdf1.nrec))
     -+			return -1;
     -+		if (h->new_count > 0 &&
     -+		    (h->new_start < 1 ||
     -+		     h->new_start + h->new_count - 1 > (long)xe->xdf2.nrec))
     ++		if (h->old_start < 1 || h->new_start < 1)
      +			return -1;
      +
     -+		/* Zero-count hunks: start must still be in [1, nrec+1] */
     -+		if (h->old_count == 0 &&
     -+		    (h->old_start < 1 || h->old_start > (long)xe->xdf1.nrec + 1))
     ++		/*
     ++		 * Range must fit: start + count - 1 <= nrec,
     ++		 * rewritten to avoid overflow.  Same for both sides.
     ++		 *
     ++		 * When count is 0 (pure insert/delete) the check
     ++		 * reduces to 0 > nrec - start + 1, which rejects
     ++		 * start > nrec + 1 and allows start == nrec + 1
     ++		 * (the position after the last line).
     ++		 */
     ++		if (h->old_count > (long)xe->xdf1.nrec - h->old_start + 1)
      +			return -1;
     -+		if (h->new_count == 0 &&
     -+		    (h->new_start < 1 || h->new_start > (long)xe->xdf2.nrec + 1))
     ++		if (h->new_count > (long)xe->xdf2.nrec - h->new_start + 1)
      +			return -1;
      +
     -+		/* Ordering: no overlap with previous hunk */
     ++		/* Ordering: no overlap with previous hunk (adjacent is OK) */
      +		if (h->old_start < prev_old_end ||
      +		    h->new_start < prev_new_end)
      +			return -1;
     @@ xdiff/xdiffi.c: static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdf
      -	}
      +
      +diff_done:
     -+
       	if (xdl_change_compact(&xe.xdf1, &xe.xdf2, xpp->flags) < 0 ||
       	    xdl_change_compact(&xe.xdf2, &xe.xdf1, xpp->flags) < 0 ||
       	    xdl_build_script(&xe, &xscr) < 0) {
 2:  de6d85f9d7 ! 2:  58f4763c63 userdiff: add diff.<driver>.process config
     @@ Metadata
       ## Commit message ##
          userdiff: add diff.<driver>.process config
      
     -    Add a new per-driver configuration key that specifies the command
     -    for a long-running diff process.
     -
     -    The name follows filter.<driver>.process: a long-running subprocess
     -    that stays alive across files within a single git invocation.
     +    Add the process field to struct userdiff_driver and teach the
     +    config parser to populate it from diff.<driver>.process.
      
          Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
      
 -:  ---------- > 3:  d6c833dd42 sub-process: separate process lifecycle from hashmap management
 3:  c25647c6e5 ! 4:  d044fa0ee5 diff: add long-running diff process via diff.<driver>.process
     @@ Commit message
                  process = /path/to/diff-tool
      
          The tool provides custom line-matching: it receives file pairs
     -    and returns hunks that reference original line numbers.  Unlike
     -    textconv, which transforms the displayed content, the diff
     -    output shows the actual file while the tool controls which
     -    lines are marked as changed.
     +    and returns hunks that reference line numbers in the content.
     +    When textconv is also configured, the tool receives the
     +    textconv-transformed content.  The tool controls which lines
     +    are marked as changed while the display shows the file content.
     +    Patch output features (word diff, function context, color) work
     +    normally; summary formats like --stat use their own diff path
     +    and are not affected.
      
          The handshake negotiates version=1 and capability=hunks.  Per-file
          requests send command=hunks, pathname, and both file contents as
          packetized data.  The tool responds with hunk lines and a status
     -    packet.  On error, git falls back to the builtin diff algorithm
     -    with a warning.
     +    packet (success, error, or abort).  On error, Git warns and falls
     +    back to the builtin diff algorithm for that file.  On abort, Git
     +    silently falls back for the current file and stops sending further
     +    requests to the tool for the remainder of the session.
      
     -    Zero hunks with status=success means the tool considers the
     -    files equivalent.  Git skips diff output for that file.
     +    When the tool returns no hunks followed by status=success, Git
     +    treats the file as having no changes and produces no diff output.
     +    This also means --exit-code reports no changes for that file.
     +
     +    The subprocess is stored on the userdiff_driver struct and
     +    launched on first use.  If the process fails to start, the
     +    handshake fails, or a communication error occurs mid-stream,
     +    the failure is cached on the driver to avoid retrying and
     +    re-warning on every subsequent file.
     +
     +    diff_process_fill_hunks() is the sole public entry point.  It
     +    handles driver lookup, flag checks, subprocess management, and
     +    error reporting, returning an enum that lets callers distinguish
     +    "hunks populated" from "files equivalent" from "not applicable"
     +    from "tool failure."
      
          Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
      
     @@ Documentation/config/diff.adoc: endif::git-diff[]
       	conversion outputs.  See linkgit:gitattributes[5] for details.
       
      +`diff.<driver>.process`::
     -+	The command to run as a long-running diff process.
     -+	The tool communicates via the pkt-line protocol and returns
     -+	hunks that are fed into Git's diff and blame pipelines.
     -+	If the tool returns zero hunks, the file is treated as
     -+	unchanged for both diff output and blame attribution.
     ++	The command to run as a long-running diff process that
     ++	provides hunks to Git's diff pipeline.
      +	See linkgit:gitattributes[5] for details.
      +
       `diff.indentHeuristic`::
     @@ Documentation/gitattributes.adoc: NOTE: If `diff.<name>.command` is defined for
       algorithm is not passed to the external diff driver.
       
      +Using an external diff process
     -+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     -+
     -+An external tool can provide content-aware line matching by
     -+setting `diff.<name>.process` to the command that runs
     -+the tool.  The tool is a long-running process that communicates via
     -+the pkt-line protocol (described in
     ++^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     ++
     ++If `diff.<name>.process` is defined, Git sends the old and new file
     ++content to an external tool and receives back a list of changed
     ++regions (pairs of line ranges in the old and new file).  Git uses
     ++these instead of its builtin diff algorithm, but still controls
     ++all output formatting, so features like word diff, function context,
     ++color, and blame work normally.  This is achieved by using the
     ++long-running process protocol (described in
      +Documentation/technical/long-running-process-protocol.adoc).
     ++Unlike `diff.<name>.command`, which replaces Git's output entirely,
     ++the diff process feeds results back into the standard pipeline.
     ++
     ++First, in `.gitattributes`, assign the `diff` attribute for paths.
      +
      +------------------------
      +*.c diff=cdiff
      +------------------------
      +
     ++Then, define a "diff.<name>.process" configuration to specify
     ++the diff process command.
     ++
      +----------------------------------------------------------------
      +[diff "cdiff"]
      +  process = /path/to/diff-process-tool
      +----------------------------------------------------------------
      +
     -+The tool receives file pairs and returns hunk descriptors indicating
     -+which lines changed.  Git feeds these hunks into its standard diff
     -+pipeline, so all output features (word diff, function context,
     -+color) work normally.
     -+
     -+If the tool fails or returns an error, Git silently falls back to
     -+the builtin diff algorithm.  If the tool returns invalid hunks
     -+(out of bounds, overlapping), Git also falls back silently.
     -+
     -+The handshake negotiates `version=1` and `capability=hunks`.
     -+Per-file requests send `command=hunks` and `pathname=<path>`,
     -+followed by the old and new file content as packetized data.
     -+The tool responds with lines of the form
     -+`hunk <old_start> <old_count> <new_start> <new_count>`
     -+(1-based line numbers), a flush packet, and `status=success`.
     -+
     -+If the tool returns zero hunks with `status=success`, Git treats
     -+the file as having no changes and produces no diff output.
     -+
     -+Tools should ignore unknown keys in the per-file request to
     -+remain forward-compatible.
     ++When Git encounters the first file that needs to be diffed, it starts
     ++the process and performs the handshake.  In the handshake, the welcome
     ++message sent by Git is "git-diff-client", only version 1 is supported,
     ++and the supported capability is "hunks" (the changed regions
     ++described below).
     ++
     ++For each file, Git sends a list of "key=value" pairs terminated with
     ++a flush packet, followed by the old and new file content as packetized
     ++data, each terminated with a flush packet.  The pathname is relative
     ++to the repository root.  When `diff.<name>.textconv` is also set,
     ++the tool receives the textconv-transformed content rather than the
     ++raw blob.  Git does not send binary files to the diff process.
     ++
     ++-----------------------
     ++packet:          git> command=hunks
     ++packet:          git> pathname=path/file.c
     ++packet:          git> 0000
     ++packet:          git> OLD_CONTENT
     ++packet:          git> 0000
     ++packet:          git> NEW_CONTENT
     ++packet:          git> 0000
     ++-----------------------
     ++
     ++The tool is expected to respond with zero or more hunk lines,
     ++a flush packet, and a status packet terminated with a flush packet.
     ++Each hunk line has the form:
     ++
     ++  `hunk <old_start> <old_count> <new_start> <new_count>`
     ++
     ++where `<old_start>` and `<old_count>` identify a range of lines in
     ++the old file, and `<new_start>` and `<new_count>` identify the
     ++replacement range in the new file.  Start values are 1-based and
     ++counts are non-negative.  Ranges must not extend beyond the end of
     ++the file.  For example, `hunk 3 2 3 4` means that 2 lines starting
     ++at line 3 in the old file were replaced by 4 lines starting at
     ++line 3 in the new file.  An `<old_count>` of 0 means no lines were
     ++removed (pure insertion); a `<new_count>` of 0 means no lines were
     ++added (pure deletion).
     ++
     ++Lines are delimited by newlines.  A file `"foo\nbar\n"` and a
     ++file `"foo\nbar"` both have 2 lines.
     ++
     ++Hunks must be listed in order and must not overlap.  Any line
     ++not covered by a hunk is treated as unchanged, so the total
     ++number of unchanged lines must be the same on both sides.
     ++For example, if the old file has 10 lines and the hunks cover
     ++4 of them (`old_count` values summing to 4), then 6 old lines
     ++are unchanged.  The new file must also have exactly 6 lines
     ++not covered by hunks, so the `new_count` values must sum to
     ++`new_file_lines - 6`.
     ++
     ++-----------------------
     ++packet:          git< hunk 1 3 1 5
     ++packet:          git< hunk 10 2 12 2
     ++packet:          git< 0000
     ++packet:          git< status=success
     ++packet:          git< 0000
     ++-----------------------
     ++
     ++If the tool responds with hunks and "success", Git marks those lines
     ++as changed and feeds them into the standard diff pipeline.  Patch
     ++output features (word diff, function context, color) work normally.
     ++Note that `--stat` and other summary formats use their own diff path
     ++and are not affected by the diff process.
     ++
     ++If no hunk lines precede the flush, followed by "success", Git
     ++treats the files as having no changes: `git diff` produces no output
     ++and `git blame` skips the commit, attributing lines to earlier commits.
     ++
     ++-----------------------
     ++packet:          git< 0000
     ++packet:          git< status=success
     ++packet:          git< 0000
     ++-----------------------
     ++
     ++If the tool returns invalid hunks (out of bounds, overlapping), Git
     ++silently falls back to the builtin diff algorithm.
     ++
     ++In case the tool cannot or does not want to process the content,
     ++it is expected to respond with an "error" status.  Git warns and
     ++falls back to the builtin diff algorithm for this file.  The tool
     ++remains available for subsequent files.
     ++
     ++-----------------------
     ++packet:          git< 0000
     ++packet:          git< status=error
     ++packet:          git< 0000
     ++-----------------------
     ++
     ++In case the tool cannot or does not want to process the content as
     ++well as any future content for the lifetime of the Git process, it
     ++is expected to respond with an "abort" status.  Git silently falls
     ++back to the builtin diff algorithm for this file and does not send
     ++further requests to the tool.
     ++
     ++-----------------------
     ++packet:          git< 0000
     ++packet:          git< status=abort
     ++packet:          git< 0000
     ++-----------------------
     ++
     ++If the tool dies during the communication or does not adhere to the
     ++protocol then Git will stop the process and fall back to the builtin
     ++diff algorithm.  Git warns once and does not restart the process for
     ++subsequent files.
     ++
     ++Tools should ignore unknown keys in the per-file request to remain
     ++forward-compatible.  Future versions of Git may send additional
     ++`command=` values; tools that receive an unrecognized command should
     ++respond with `status=error` rather than terminating.
      +
       Defining a custom hunk-header
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     @@ diff-process.c (new)
      +/*
      + * Diff process backend: communicates with a long-running external
      + * tool via the pkt-line protocol to obtain custom line-matching
     -+ * results.  Unlike textconv, which transforms the displayed content,
     -+ * hunks from a diff process reference original line numbers and
     -+ * the display shows the actual file content.
     ++ * results.  The tool controls which lines are marked as changed
     ++ * while the display shows the file content (after any textconv
     ++ * transformation, if configured).
      + *
      + * Protocol: pkt-line over stdin/stdout, following the pattern of
      + * the long-running filter process protocol (see convert.c).
     @@ diff-process.c (new)
      + *   tool< ... / flush
      + *   tool< status=success / flush
      + *
     -+ * Zero hunks with status=success means the tool considers the
     -+ * files equivalent.  Git will skip the diff for that file.
     ++ * When the tool returns no hunks with status=success, it considers
     ++ * the files equivalent.  Git will skip the diff for that file.
      + */
      +
      +#include "git-compat-util.h"
      +#include "diff-process.h"
     ++#include "diff.h"
     ++#include "gettext.h"
     ++#include "repository.h"
     ++#include "sigchain.h"
      +#include "userdiff.h"
      +#include "sub-process.h"
      +#include "pkt-line.h"
     @@ diff-process.c (new)
      +	unsigned int supported_capabilities;
      +};
      +
     -+static int subprocess_map_initialized;
     -+static struct hashmap subprocess_map;
     -+
      +static int start_diff_process_fn(struct subprocess_entry *subprocess)
      +{
      +	static int versions[] = { 1, 0 };
     @@ diff-process.c (new)
      +		{ NULL, 0 }
      +	};
      +	struct diff_subprocess *entry =
     -+		(struct diff_subprocess *)subprocess;
     ++		container_of(subprocess, struct diff_subprocess, subprocess);
      +
     -+	/* Uses dying pkt-line variant, same as convert.c filters. */
      +	return subprocess_handshake(subprocess, "git-diff",
      +				    versions, NULL,
      +				    capabilities,
      +				    &entry->supported_capabilities);
      +}
      +
     -+static struct diff_subprocess *find_or_start_process(const char *cmd)
     ++static struct diff_subprocess *get_or_launch_process(
     ++		struct userdiff_driver *drv)
      +{
      +	struct diff_subprocess *entry;
      +
     -+	if (!subprocess_map_initialized) {
     -+		subprocess_map_initialized = 1;
     -+		hashmap_init(&subprocess_map, cmd2process_cmp, NULL, 0);
     -+	}
     -+
     -+	entry = (struct diff_subprocess *)
     -+		subprocess_find_entry(&subprocess_map, cmd);
     -+	if (entry)
     -+		return entry;
     ++	if (drv->diff_subprocess)
     ++		return drv->diff_subprocess;
      +
      +	entry = xcalloc(1, sizeof(*entry));
     -+	if (subprocess_start(&subprocess_map, &entry->subprocess,
     -+			     cmd, start_diff_process_fn)) {
     ++	if (subprocess_start_command(&entry->subprocess, drv->process,
     ++				     start_diff_process_fn)) {
      +		free(entry);
     ++		drv->diff_process_failed = 1;
      +		return NULL;
      +	}
      +
     ++	drv->diff_subprocess = entry;
      +	return entry;
      +}
      +
      +static int send_file_content(int fd, const char *buf, long size)
      +{
     -+	int ret;
     ++	int ret = 0;
      +
     ++	if (size < 0)
     ++		return -1;
      +	if (size > 0)
      +		ret = write_packetized_from_buf_no_flush(buf, size, fd);
     -+	else
     -+		ret = 0;
      +	if (ret)
      +		return ret;
      +	return packet_flush_gently(fd);
     @@ diff-process.c (new)
      +{
      +	char *end;
      +
     -+	/* Format: "hunk <old_start> <old_count> <new_start> <new_count>" */
     ++	/*
     ++	 * Format: "hunk <old_start> <old_count> <new_start> <new_count>"
     ++	 * All numbers must be non-negative decimal with no leading
     ++	 * whitespace or sign characters.
     ++	 */
      +	if (!skip_prefix(line, "hunk ", &line))
      +		return -1;
      +
     ++	if (!isdigit(*line))
     ++		return -1;
     ++	errno = 0;
      +	hunk->old_start = strtol(line, &end, 10);
     -+	if (end == line || *end != ' ')
     ++	if (errno || end == line || *end++ != ' ')
      +		return -1;
      +	line = end;
      +
     ++	if (!isdigit(*line))
     ++		return -1;
     ++	errno = 0;
      +	hunk->old_count = strtol(line, &end, 10);
     -+	if (end == line || *end != ' ')
     ++	if (errno || end == line || *end++ != ' ')
      +		return -1;
      +	line = end;
      +
     ++	if (!isdigit(*line))
     ++		return -1;
     ++	errno = 0;
      +	hunk->new_start = strtol(line, &end, 10);
     -+	if (end == line || *end != ' ')
     ++	if (errno || end == line || *end++ != ' ')
      +		return -1;
      +	line = end;
      +
     ++	if (!isdigit(*line))
     ++		return -1;
     ++	errno = 0;
      +	hunk->new_count = strtol(line, &end, 10);
     -+	if (end == line || *end != '\0')
     ++	if (errno || end == line || *end != '\0')
      +		return -1;
      +
      +	return 0;
      +}
      +
     -+int diff_process_get_hunks(struct userdiff_driver *drv,
     -+			   const char *path,
     -+			   const char *old_buf, long old_size,
     -+			   const char *new_buf, long new_size,
     -+			   struct xdl_hunk **hunks_out,
     -+			   size_t *nr_hunks_out)
     ++static enum diff_process_result get_hunks(
     ++		struct userdiff_driver *drv,
     ++		const char *path,
     ++		const char *old_buf, long old_size,
     ++		const char *new_buf, long new_size,
     ++		struct xdl_hunk **hunks_out,
     ++		size_t *nr_hunks_out)
      +{
      +	struct diff_subprocess *backend;
      +	struct child_process *process;
     @@ diff-process.c (new)
      +	int len;
      +	char *line;
      +
     -+	if (!drv || !drv->process)
     -+		return -1;
     -+
     -+	backend = find_or_start_process(drv->process);
     ++	backend = get_or_launch_process(drv);
      +	if (!backend)
     -+		return -1;
     ++		return DIFF_PROCESS_ERROR;
      +
      +	if (!(backend->supported_capabilities & CAP_HUNKS))
     -+		return -1;
     ++		return DIFF_PROCESS_SKIP;
      +
      +	process = subprocess_get_child_process(&backend->subprocess);
      +	fd_in = process->in;
      +	fd_out = process->out;
      +
     ++	sigchain_push(SIGPIPE, SIG_IGN);
     ++
      +	/* Send request */
      +	if (packet_write_fmt_gently(fd_in, "command=hunks\n") ||
      +	    packet_write_fmt_gently(fd_in, "pathname=%s\n", path) ||
      +	    packet_flush_gently(fd_in))
     -+		goto error;
     ++		goto comm_error;
      +
      +	/* Send old file content */
      +	if (send_file_content(fd_in, old_buf, old_size))
     -+		goto error;
     ++		goto comm_error;
      +
      +	/* Send new file content */
      +	if (send_file_content(fd_in, new_buf, new_size))
     -+		goto error;
     ++		goto comm_error;
      +
      +	/* Read hunks until flush packet */
      +	while ((len = packet_read_line_gently(fd_out, NULL, &line)) >= 0 &&
      +	       line) {
      +		if (parse_hunk_line(line, &hunk) < 0)
     -+			goto error;
     ++			goto comm_error;
      +		ALLOC_GROW(hunks, nr_hunks + 1, alloc_hunks);
      +		hunks[nr_hunks++] = hunk;
      +	}
      +	if (len < 0)
     -+		goto error;
     ++		goto comm_error;
      +
      +	/* Read status */
      +	if (subprocess_read_status(fd_out, &status))
     -+		goto error;
     ++		goto comm_error;
     ++
     ++	if (!strcmp(status.buf, "success")) {
     ++		*hunks_out = hunks;
     ++		*nr_hunks_out = nr_hunks;
     ++		strbuf_release(&status);
     ++		sigchain_pop(SIGPIPE);
     ++		return DIFF_PROCESS_OK;
     ++	}
      +
     -+	if (strcmp(status.buf, "success")) {
     -+		if (!strcmp(status.buf, "abort"))
     -+			backend->supported_capabilities &= ~CAP_HUNKS;
     -+		goto error;
     ++	if (!strcmp(status.buf, "abort")) {
     ++		/*
     ++		 * The tool voluntarily withdrew: stop sending requests
     ++		 * but do not warn (this is not a failure).
     ++		 */
     ++		backend->supported_capabilities &= ~CAP_HUNKS;
     ++		free(hunks);
     ++		strbuf_release(&status);
     ++		sigchain_pop(SIGPIPE);
     ++		return DIFF_PROCESS_SKIP;
      +	}
      +
     -+	*hunks_out = hunks;
     -+	*nr_hunks_out = nr_hunks;
     ++	/* status=error or unknown status */
     ++	free(hunks);
      +	strbuf_release(&status);
     -+	return 0;
     -+
     -+error:
     ++	sigchain_pop(SIGPIPE);
     ++	return DIFF_PROCESS_ERROR;
     ++
     ++comm_error:
     ++	/*
     ++	 * Communication failure (broken pipe, malformed response).
     ++	 * Tear down the process and mark as failed so we do not
     ++	 * retry on every subsequent file.
     ++	 */
     ++	drv->diff_process_failed = 1;
     ++	drv->diff_subprocess = NULL;
     ++	subprocess_stop_command(&backend->subprocess);
     ++	free(backend);
      +	free(hunks);
      +	strbuf_release(&status);
     -+	return -1;
     ++	sigchain_pop(SIGPIPE);
     ++	return DIFF_PROCESS_ERROR;
     ++}
     ++
     ++enum diff_process_result diff_process_fill_hunks(
     ++		struct diff_options *diffopt,
     ++		const char *path,
     ++		const mmfile_t *file_a,
     ++		const mmfile_t *file_b,
     ++		xpparam_t *xpp)
     ++{
     ++	struct userdiff_driver *drv;
     ++	struct xdl_hunk *ext_hunks = NULL;
     ++	size_t nr = 0;
     ++	enum diff_process_result res;
     ++
     ++	if (!diffopt || !path)
     ++		return DIFF_PROCESS_SKIP;
     ++	if (diffopt->flags.no_diff_process || diffopt->ignore_driver_algorithm)
     ++		return DIFF_PROCESS_SKIP;
     ++
     ++	drv = userdiff_find_by_path(diffopt->repo->index, path);
     ++	if (!drv || !drv->process)
     ++		return DIFF_PROCESS_SKIP;
     ++	if (drv->diff_process_failed)
     ++		return DIFF_PROCESS_SKIP;
     ++
     ++	res = get_hunks(drv, path,
     ++			file_a->ptr, file_a->size,
     ++			file_b->ptr, file_b->size,
     ++			&ext_hunks, &nr);
     ++	if (res == DIFF_PROCESS_OK) {
     ++		if (!nr) {
     ++			free(ext_hunks);
     ++			return DIFF_PROCESS_EQUIVALENT;
     ++		}
     ++		xpp->external_hunks = ext_hunks;
     ++		xpp->external_hunks_nr = nr;
     ++		return DIFF_PROCESS_OK;
     ++	}
     ++	if (res == DIFF_PROCESS_ERROR) {
     ++		warning(_("diff process '%s' failed for '%s',"
     ++			  " falling back to builtin diff"),
     ++			drv->process, path);
     ++		return DIFF_PROCESS_ERROR;
     ++	}
     ++	return DIFF_PROCESS_SKIP;
      +}
      
       ## diff-process.h (new) ##
     @@ diff-process.h (new)
      +#ifndef DIFF_PROCESS_H
      +#define DIFF_PROCESS_H
      +
     -+struct userdiff_driver;
     -+struct xdl_hunk;
     ++#include "xdiff/xdiff.h"
     ++
     ++struct diff_options;
     ++
     ++enum diff_process_result {
     ++	DIFF_PROCESS_ERROR = -1, /* tool failure: warned, fell back */
     ++	DIFF_PROCESS_OK = 0,     /* hunks populated in xpp */
     ++	DIFF_PROCESS_SKIP,       /* no process configured: use builtin */
     ++	DIFF_PROCESS_EQUIVALENT, /* tool says files are equivalent */
     ++};
      +
      +/*
     -+ * Query a diff process for hunks describing the changes
     -+ * between old_buf and new_buf.
     ++ * Consult the diff process configured for 'path' and populate
     ++ * xpp->external_hunks with the returned hunks.
      + *
     -+ * The backend is a long-running subprocess configured via
     -+ * diff.<driver>.process.  It receives file content via
     -+ * pkt-line and returns hunks with 1-based line numbers.
     ++ * Handles driver lookup, flag checks (--no-ext-diff,
     ++ * --diff-algorithm), subprocess management, and error reporting.
      + *
     -+ * On success, sets *hunks_out and *nr_hunks_out to a newly allocated
     -+ * array (caller must free) and returns 0.
     ++ * Returns DIFF_PROCESS_OK when hunks are populated in xpp.
     ++ * The caller owns xpp->external_hunks and must free() it.
      + *
     -+ * On failure, returns -1.  The caller should fall back to the
     -+ * builtin diff algorithm.
     ++ * Returns DIFF_PROCESS_EQUIVALENT when the tool returns no hunks
     ++ * (files are considered identical); caller should skip diff/blame.
     ++ * Returns DIFF_PROCESS_SKIP when no process applies; caller
     ++ * should use the builtin diff algorithm.
     ++ * Returns DIFF_PROCESS_ERROR on tool failure (already warned);
     ++ * caller should fall back to the builtin diff algorithm.
      + */
     -+int diff_process_get_hunks(struct userdiff_driver *drv,
     -+			   const char *path,
     -+			   const char *old_buf, long old_size,
     -+			   const char *new_buf, long new_size,
     -+			   struct xdl_hunk **hunks_out,
     -+			   size_t *nr_hunks_out);
     ++enum diff_process_result diff_process_fill_hunks(
     ++		struct diff_options *diffopt,
     ++		const char *path,
     ++		const mmfile_t *file_a,
     ++		const mmfile_t *file_b,
     ++		xpparam_t *xpp);
      +
      +#endif /* DIFF_PROCESS_H */
      
     @@ diff.c
       #include "submodule.h"
       #include "hashmap.h"
       #include "mem-pool.h"
     -@@ diff.c: static void builtin_diff(const char *name_a,
     - 		xpparam_t xpp;
     - 		xdemitconf_t xecfg;
     - 		struct emit_callback ecbdata;
     -+		struct xdl_hunk *ext_hunks = NULL;
     - 		unsigned ws_rule;
     - 		const struct userdiff_funcname *pe;
     - 
      @@ diff.c: static void builtin_diff(const char *name_a,
       		xpp.ignore_regex_nr = o->ignore_regex_nr;
       		xpp.anchors = o->anchors;
       		xpp.anchors_nr = o->anchors_nr;
      +
     -+		if (!o->ignore_driver_algorithm &&
     -+		    one->driver && one->driver->process) {
     -+			size_t ext_hunks_nr = 0;
     -+			if (!diff_process_get_hunks(
     -+				    one->driver, name_a,
     -+				    mf1.ptr, mf1.size,
     -+				    mf2.ptr, mf2.size,
     -+				    &ext_hunks, &ext_hunks_nr)) {
     -+				if (!ext_hunks_nr)
     -+					goto free_ab_and_return;
     -+				xpp.external_hunks = ext_hunks;
     -+				xpp.external_hunks_nr = ext_hunks_nr;
     -+			} else {
     -+				warning(_("diff process failed for '%s',"
     -+					  " falling back to builtin diff"),
     -+					name_a);
     -+			}
     ++		if (diff_process_fill_hunks(o, name_a,
     ++					    &mf1, &mf2, &xpp)
     ++		    == DIFF_PROCESS_EQUIVALENT) {
     ++			if (textconv_one)
     ++				free(mf1.ptr);
     ++			if (textconv_two)
     ++				free(mf2.ptr);
     ++			goto free_ab_and_return;
      +		}
      +
       		xecfg.ctxlen = o->context;
     @@ diff.c: static void builtin_diff(const char *name_a,
       		} else if (xdi_diff_outf(&mf1, &mf2, NULL, fn_out_consume,
       					 &ecbdata, &xpp, &xecfg))
       			die("unable to generate diff for %s", one->path);
     -+		free(ext_hunks);
     ++		free(xpp.external_hunks);
       		if (o->word_diff)
       			free_diff_words_data(&ecbdata);
       		if (textconv_one)
      
     + ## diff.h ##
     +@@ diff.h: struct diff_flags {
     + 	 */
     + 	unsigned allow_external;
     + 
     ++	/** Disables diff.<driver>.process. */
     ++	unsigned no_diff_process;
     ++
     + 	/**
     + 	 * For communication between the calling program and the options parser;
     + 	 * tell the calling program to signal the presence of difference using
     +
     + ## meson.build ##
     +@@ meson.build: libgit_sources = [
     +   'diff-merges.c',
     +   'diff-lib.c',
     +   'diff-no-index.c',
     ++  'diff-process.c',
     +   'diff.c',
     +   'diffcore-break.c',
     +   'diffcore-delta.c',
     +
       ## t/.gitattributes ##
      @@ t/.gitattributes: t[0-9][0-9][0-9][0-9]/* -whitespace
       /t8005/*.txt eol=lf
     @@ t/.gitattributes: t[0-9][0-9][0-9][0-9]/* -whitespace
       /t0040*.sh whitespace=-indent-with-non-tab
      +/t4080-diff-process.sh whitespace=-indent-with-non-tab
      
     + ## t/meson.build ##
     +@@ t/meson.build: integration_tests = [
     +   't4072-diff-max-depth.sh',
     +   't4073-diff-stat-name-width.sh',
     +   't4074-diff-shifted-matched-group.sh',
     ++  't4080-diff-process.sh',
     +   't4100-apply-stat.sh',
     +   't4101-apply-nonl.sh',
     +   't4102-apply-rename.sh',
     +
       ## t/t4080-diff-process.sh (new) ##
      @@
      +#!/bin/sh
     @@ t/t4080-diff-process.sh (new)
      +#   whole-file  - report all lines as changed (default)
      +#   fixed-hunk  - always report hunk 5 2 5 2
      +#   bad-hunk    - report out-of-bounds hunk 999 1 999 1
     -+#   zero-hunk   - return zero hunks (files considered equivalent)
     ++#   bad-sync    - report hunk with mismatched unchanged totals
     ++#   overlap     - report two overlapping hunks
     ++#   no-hunks   - return no hunks (files considered equivalent)
      +#   error       - return status=error for every request
      +#   abort       - return status=abort for every request
      +#   crash       - read one request then exit without responding
     @@ t/t4080-diff-process.sh (new)
      +	    if cmd is None: sys.exit(0)
      +	    old = read_content()
      +	    new = read_content()
     -+	    log(f"command={cmd} pathname={pathname}")
     ++	    old_first = old.split(b"\n")[0].decode(errors="replace") if old else ""
     ++	    new_first = new.split(b"\n")[0].decode(errors="replace") if new else ""
     ++	    log(f"command={cmd} pathname={pathname} old={old_first} new={new_first}")
      +
      +	    if mode == "error":
      +	        write_flush()
     @@ t/t4080-diff-process.sh (new)
      +	            write_pkt("hunk 5 2 5 2")
      +	        elif mode == "bad-hunk":
      +	            write_pkt("hunk 999 1 999 1")
     -+	        elif mode == "zero-hunk":
     ++	        elif mode == "bad-sync":
     ++	            write_pkt("hunk 1 2 1 1")
     ++	        elif mode == "overlap":
     ++	            write_pkt("hunk 1 5 1 5")
     ++	            write_pkt("hunk 3 2 3 2")
     ++	        elif mode == "no-hunks":
      +	            pass
      +	        else:
     -+	            ol = len(old.split(b"\n"))
     -+	            nl = len(new.split(b"\n"))
     ++	            ol = old.count(b"\n")
     ++	            nl = new.count(b"\n")
      +	            write_pkt(f"hunk 1 {ol} 1 {nl}")
      +	        write_flush()
      +	        write_pkt("status=success")
     @@ t/t4080-diff-process.sh (new)
      +	setup_backend &&
      +	echo "*.c diff=cdiff" >.gitattributes &&
      +	git add .gitattributes &&
     -+	git commit -m "initial"
     -+'
      +
     -+test_expect_success PYTHON 'diff process hunk boundaries affect output' '
     ++	# boundary.c: 10 lines, changes at 5-6 and 9-10.
     ++	# Used by: hunk boundaries, error fallback, crash, bad hunks, overlap.
      +	cat >boundary.c <<-\EOF &&
      +	line1
      +	line2
     @@ t/t4080-diff-process.sh (new)
      +	OLD10
      +	EOF
      +	git add boundary.c &&
     -+	git commit -m "add boundary.c" &&
      +
     ++	# worddiff.c: single-line function, value changes 1 -> 999.
     ++	# Used by: word-diff, --diff-algorithm, --no-ext-diff, --stat.
     ++	cat >worddiff.c <<-\EOF &&
     ++	int value(void) { return 1; }
     ++	EOF
     ++	git add worddiff.c &&
     ++
     ++	# newfile.c: single-line function, value changes 42 -> 99.
     ++	# Used by: new file, --exit-code, multiple drivers.
     ++	cat >newfile.c <<-\EOF &&
     ++	int new_func(void) { return 42; }
     ++	EOF
     ++	git add newfile.c &&
     ++
     ++	# logtest.c: single-line function for log/format-patch tests.
     ++	# Needs two commits so log -1 has a diff.
     ++	cat >logtest.c <<-\EOF &&
     ++	int logfunc(void) { return 1; }
     ++	EOF
     ++	git add logtest.c &&
     ++
     ++	# two.c/one.c: two-file pair for error/abort/startup-failure tests.
     ++	cat >one.c <<-\EOF &&
     ++	int first(void) { return 1; }
     ++	EOF
     ++	cat >two.c <<-\EOF &&
     ++	int second(void) { return 2; }
     ++	EOF
     ++	git add one.c two.c &&
     ++
     ++	git commit -m "initial" &&
     ++
     ++	# Second commit for logtest.c (so log -1 has something to show).
     ++	cat >logtest.c <<-\EOF &&
     ++	int logfunc(void) { return 2; }
     ++	EOF
     ++	git add logtest.c &&
     ++	git commit -m "change logtest.c" &&
     ++
     ++	# Working tree modifications (not committed).
      +	cat >boundary.c <<-\EOF &&
      +	line1
      +	line2
     @@ t/t4080-diff-process.sh (new)
      +	NEW10
      +	EOF
      +
     ++	cat >worddiff.c <<-\EOF &&
     ++	int value(void) { return 999; }
     ++	EOF
     ++
     ++	cat >newfile.c <<-\EOF &&
     ++	int new_func(void) { return 99; }
     ++	EOF
     ++
     ++	cat >one.c <<-\EOF &&
     ++	int first(void) { return 10; }
     ++	EOF
     ++
     ++	cat >two.c <<-\EOF
     ++	int second(void) { return 20; }
     ++	EOF
     ++'
     ++
     ++#
     ++# Core behavior: the tool controls which lines are marked as changed.
     ++#
     ++
     ++test_expect_success PYTHON 'diff process hunk boundaries affect output' '
      +	# The file has changes at lines 5-6 and 9-10, but fixed-hunk
      +	# only reports lines 5-6 as changed.  Lines 9-10 should not
      +	# appear as changed in the output.
     @@ t/t4080-diff-process.sh (new)
      +	test_grep "^-OLD6" actual &&
      +	test_grep "^+NEW5" actual &&
      +	test_grep "^+NEW6" actual &&
     -+	! test_grep "^-OLD9" actual &&
     -+	! test_grep "^-OLD10" actual &&
     -+	! test_grep "^+NEW9" actual &&
     -+	! test_grep "^+NEW10" actual
     ++	test_grep ! "^-OLD9" actual &&
     ++	test_grep ! "^-OLD10" actual &&
     ++	test_grep ! "^+NEW9" actual &&
     ++	test_grep ! "^+NEW10" actual
      +'
      +
     -+test_expect_success PYTHON 'diff process fallback on tool error status' '
     ++test_expect_success PYTHON 'diff process works with new file' '
      +	rm -f backend.log &&
     -+	git -c diff.cdiff.process="$BACKEND --mode=error --log=backend.log" \
     -+		diff boundary.c >actual &&
     -+	# Fallback produces the full builtin diff (both change regions).
     -+	test_grep "^-OLD5" actual &&
     -+	test_grep "^+NEW5" actual &&
     -+	test_grep "^-OLD9" actual &&
     -+	test_grep "^+NEW9" actual &&
     -+	# Tool was contacted (it replied with error, not crash).
     -+	test_grep "command=hunks pathname=boundary.c" backend.log
     ++	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
     ++		diff -- newfile.c >actual 2>stderr &&
     ++	test_grep "return 99" actual &&
     ++	test_grep "pathname=newfile.c" backend.log &&
     ++	test_must_be_empty stderr
      +'
      +
     -+test_expect_success PYTHON 'diff process fallback on bad hunks' '
     -+	git -c diff.cdiff.process="$BACKEND --mode=bad-hunk" \
     -+		diff boundary.c >actual &&
     -+	test_grep "^-OLD5" actual &&
     -+	test_grep "^+NEW5" actual &&
     -+	test_grep "^-OLD9" actual &&
     -+	test_grep "^+NEW9" actual
     ++test_expect_success PYTHON 'diff process works with added file (empty old side)' '
     ++	cat >added.c <<-\EOF &&
     ++	int added(void) { return 1; }
     ++	EOF
     ++	git add added.c &&
     ++
     ++	rm -f backend.log &&
     ++	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
     ++		diff --cached -- added.c >actual 2>stderr &&
     ++	test_grep "added" actual &&
     ++	test_grep "pathname=added.c" backend.log &&
     ++	test_must_be_empty stderr
      +'
      +
     -+test_expect_success PYTHON 'diff process fallback on tool crash' '
     -+	git -c diff.cdiff.process="$BACKEND --mode=crash" \
     -+		diff boundary.c >actual &&
     -+	test_grep "^-OLD5" actual &&
     -+	test_grep "^+NEW5" actual &&
     -+	test_grep "^-OLD9" actual &&
     -+	test_grep "^+NEW9" actual
     ++test_expect_success PYTHON 'diff process skipped for binary files' '
     ++	printf "\\0binary" >binary.c &&
     ++	git add binary.c &&
     ++	git commit -m "add binary" &&
     ++	printf "\\0changed" >binary.c &&
     ++
     ++	rm -f backend.log &&
     ++	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
     ++		diff -- binary.c >actual &&
     ++	test_grep "Binary files" actual &&
     ++	test_path_is_missing backend.log
      +'
      +
     -+test_expect_success PYTHON 'diff process abort disables for session' '
     -+	cat >abort1.c <<-\EOF &&
     -+	int first(void) { return 1; }
     -+	EOF
     -+	cat >abort2.c <<-\EOF &&
     -+	int second(void) { return 2; }
     -+	EOF
     -+	git add abort1.c abort2.c &&
     -+	git commit -m "add abort files" &&
     ++test_expect_success PYTHON 'diff process not consulted for unmatched driver' '
     ++	echo "not tracked by cdiff" >unmatched.txt &&
     ++	git add unmatched.txt &&
     ++	git commit -m "add unmatched.txt" &&
      +
     -+	cat >abort1.c <<-\EOF &&
     -+	int first(void) { return 10; }
     -+	EOF
     -+	cat >abort2.c <<-\EOF &&
     -+	int second(void) { return 20; }
     -+	EOF
     ++	echo "modified" >unmatched.txt &&
      +
      +	rm -f backend.log &&
     -+	git -c diff.cdiff.process="$BACKEND --mode=abort --log=backend.log" \
     -+		diff -- abort1.c abort2.c >actual &&
     -+	# Both files should still produce diff output via fallback.
     -+	test_grep "return 10" actual &&
     -+	test_grep "return 20" actual &&
     -+	# The tool aborts on the first file and git clears its
     -+	# capability.  The second file never contacts the tool,
     -+	# so the log should have exactly one entry, not two.
     -+	test_grep "command=hunks" backend.log >matches &&
     -+	test_line_count = 1 matches
     ++	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
     ++		diff -- unmatched.txt >actual &&
     ++	test_grep "modified" actual &&
     ++	test_path_is_missing backend.log
      +'
      +
     -+test_expect_success PYTHON 'diff process handles multiple files' '
     -+	cat >multi1.c <<-\EOF &&
     -+	int one(void) { return 1; }
     ++test_expect_success PYTHON 'multiple drivers use separate processes' '
     ++	echo "*.h diff=hdiff" >>.gitattributes &&
     ++	git add .gitattributes &&
     ++
     ++	cat >multi.h <<-\EOF &&
     ++	int header(void) { return 1; }
      +	EOF
     -+	cat >multi2.c <<-\EOF &&
     -+	int two(void) { return 2; }
     ++	git add multi.h &&
     ++	git commit -m "add multi.h" &&
     ++
     ++	cat >multi.h <<-\EOF &&
     ++	int header(void) { return 2; }
      +	EOF
     -+	git add multi1.c multi2.c &&
     -+	git commit -m "add multi files" &&
      +
     -+	cat >multi1.c <<-\EOF &&
     -+	int one(void) { return 10; }
     ++	rm -f backend-c.log backend-h.log &&
     ++	git -c diff.cdiff.process="$BACKEND --log=backend-c.log" \
     ++	    -c diff.hdiff.process="$BACKEND --log=backend-h.log" \
     ++		diff -- newfile.c multi.h >actual 2>stderr &&
     ++	test_grep "pathname=newfile.c" backend-c.log &&
     ++	test_grep "pathname=multi.h" backend-h.log &&
     ++	test_must_be_empty stderr
     ++'
     ++
     ++test_expect_success PYTHON 'diff process works alongside textconv' '
     ++	write_script uppercase-filter <<-\EOF &&
     ++	tr "a-z" "A-Z" <"$1"
      +	EOF
     -+	cat >multi2.c <<-\EOF &&
     -+	int two(void) { return 20; }
     ++
     ++	cat >textconv.c <<-\EOF &&
     ++	hello world
     ++	EOF
     ++	git add textconv.c &&
     ++	git commit -m "add textconv.c" &&
     ++
     ++	cat >textconv.c <<-\EOF &&
     ++	goodbye world
      +	EOF
      +
      +	rm -f backend.log &&
     -+	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
     -+		diff -- multi1.c multi2.c >actual &&
     -+	test_grep "return 10" actual &&
     -+	test_grep "return 20" actual &&
     -+	test_grep "pathname=multi1.c" backend.log &&
     -+	test_grep "pathname=multi2.c" backend.log
     ++	git -c diff.cdiff.textconv="./uppercase-filter" \
     ++	    -c diff.cdiff.process="$BACKEND --log=backend.log" \
     ++		diff -- textconv.c >actual 2>stderr &&
     ++	# The diff process receives textconv-transformed (uppercase) content.
     ++	test_grep "pathname=textconv.c" backend.log &&
     ++	test_grep "old=HELLO WORLD" backend.log &&
     ++	test_grep "new=GOODBYE WORLD" backend.log &&
     ++	test_must_be_empty stderr
      +'
      +
     ++#
     ++# Downstream features: word diff, log, equivalent files, exit code.
     ++#
     ++
      +test_expect_success PYTHON 'diff process with --word-diff' '
     -+	cat >worddiff.c <<-\EOF &&
     -+	int value(void) { return 1; }
     ++	rm -f backend.log &&
     ++	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
     ++		diff --word-diff worddiff.c >actual 2>stderr &&
     ++	test_grep "\[-1;-\]" actual &&
     ++	test_grep "{+999;+}" actual &&
     ++	test_grep "pathname=worddiff.c" backend.log &&
     ++	test_must_be_empty stderr
     ++'
     ++
     ++test_expect_success PYTHON 'diff process works with git log -p' '
     ++	# With no-hunks mode, the tool says the files are equivalent,
     ++	# so log -p should show the commit but no diff content.
     ++	rm -f backend.log &&
     ++	git -c diff.cdiff.process="$BACKEND --mode=no-hunks --log=backend.log" \
     ++		log -1 -p -- logtest.c >actual 2>stderr &&
     ++	test_grep "change logtest.c" actual &&
     ++	test_grep ! "return 2" actual &&
     ++	test_grep "command=hunks pathname=logtest.c" backend.log &&
     ++	test_must_be_empty stderr
     ++'
     ++
     ++test_expect_success PYTHON 'diff process no hunks suppresses diff output' '
     ++	cat >nohunks.c <<-\EOF &&
     ++	int zero(void) { return 0; }
      +	EOF
     -+	git add worddiff.c &&
     -+	git commit -m "add worddiff.c" &&
     ++	git add nohunks.c &&
     ++	git commit -m "add nohunks.c" &&
      +
     -+	cat >worddiff.c <<-\EOF &&
     -+	int value(void) { return 999; }
     ++	cat >nohunks.c <<-\EOF &&
     ++	int zero(void) { return 999; }
      +	EOF
      +
     -+	git -c diff.cdiff.process="$BACKEND" \
     -+		diff --word-diff worddiff.c >actual &&
     -+	test_grep "\[-1;-\]" actual &&
     -+	test_grep "{+999;+}" actual
     ++	git -c diff.cdiff.process="$BACKEND --mode=no-hunks" \
     ++		diff nohunks.c >actual &&
     ++	test_must_be_empty actual
     ++'
     ++
     ++test_expect_success PYTHON 'diff process no hunks with --exit-code returns success' '
     ++	git -c diff.cdiff.process="$BACKEND --mode=no-hunks" \
     ++		diff --exit-code nohunks.c
     ++'
     ++
     ++test_expect_success PYTHON 'diff process with --exit-code and hunks returns failure' '
     ++	test_expect_code 1 git -c diff.cdiff.process="$BACKEND" \
     ++		diff --exit-code newfile.c
      +'
      +
     ++#
     ++# Bypass mechanisms: flags and commands that skip the diff process.
     ++#
     ++
      +test_expect_success PYTHON 'diff process bypassed by --diff-algorithm' '
      +	rm -f backend.log &&
      +	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
     @@ t/t4080-diff-process.sh (new)
      +	test_path_is_missing backend.log
      +'
      +
     -+test_expect_success PYTHON 'diff process works with git log -p' '
     -+	cat >logtest.c <<-\EOF &&
     -+	int logfunc(void) { return 1; }
     -+	EOF
     -+	git add logtest.c &&
     -+	git commit -m "add logtest.c" &&
     ++test_expect_success PYTHON 'diff process not used by --stat' '
     ++	rm -f backend.log &&
     ++	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
     ++		diff --stat worddiff.c >actual &&
     ++	test_grep "worddiff.c" actual &&
     ++	test_path_is_missing backend.log
     ++'
      +
     -+	cat >logtest.c <<-\EOF &&
     -+	int logfunc(void) { return 2; }
     -+	EOF
     -+	git add logtest.c &&
     -+	git commit -m "change logtest.c" &&
     ++#
     ++# Error handling and fallback.
     ++#
      +
     ++test_expect_success PYTHON 'diff process fallback on tool error status' '
      +	rm -f backend.log &&
     -+	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
     -+		log -1 -p -- logtest.c >actual &&
     -+	test_grep "return 2" actual &&
     -+	test_grep "command=hunks pathname=logtest.c" backend.log
     ++	git -c diff.cdiff.process="$BACKEND --mode=error --log=backend.log" \
     ++		diff boundary.c >actual 2>stderr &&
     ++	# Fallback produces the full builtin diff (both change regions).
     ++	test_grep "^-OLD5" actual &&
     ++	test_grep "^+NEW5" actual &&
     ++	test_grep "^-OLD9" actual &&
     ++	test_grep "^+NEW9" actual &&
     ++	# Tool was contacted (it replied with error, not crash).
     ++	test_grep "command=hunks pathname=boundary.c" backend.log &&
     ++	test_grep "diff process.*failed" stderr
      +'
      +
     -+test_expect_success PYTHON 'diff process zero hunks suppresses diff output' '
     -+	cat >zerohunk.c <<-\EOF &&
     -+	int zero(void) { return 0; }
     ++test_expect_success PYTHON 'diff process error keeps tool available for next file' '
     ++	rm -f backend.log &&
     ++	git -c diff.cdiff.process="$BACKEND --mode=error --log=backend.log" \
     ++		diff -- one.c two.c >actual 2>stderr &&
     ++	# Unlike abort, error keeps the tool available: both files
     ++	# are sent to the tool (and both fall back).
     ++	test_grep "pathname=one.c" backend.log &&
     ++	test_grep "pathname=two.c" backend.log &&
     ++	test_grep "return 10" actual &&
     ++	test_grep "return 20" actual
     ++'
     ++
     ++test_expect_success PYTHON 'diff process abort disables for session' '
     ++	rm -f backend.log &&
     ++	git -c diff.cdiff.process="$BACKEND --mode=abort --log=backend.log" \
     ++		diff -- one.c two.c >actual &&
     ++	# Both files should still produce diff output via fallback.
     ++	test_grep "return 10" actual &&
     ++	test_grep "return 20" actual &&
     ++	# The tool aborts on the first file and git clears its
     ++	# capability.  The second file never contacts the tool.
     ++	test_grep "pathname=one.c" backend.log &&
     ++	test_grep ! "pathname=two.c" backend.log
     ++'
     ++
     ++test_expect_success PYTHON 'diff process fallback on tool crash' '
     ++	git -c diff.cdiff.process="$BACKEND --mode=crash" \
     ++		diff boundary.c >actual 2>stderr &&
     ++	test_grep "^-OLD5" actual &&
     ++	test_grep "^+NEW5" actual &&
     ++	test_grep "^-OLD9" actual &&
     ++	test_grep "^+NEW9" actual &&
     ++	# Crash is a communication failure, so a warning is emitted.
     ++	test_grep "diff process.*failed" stderr
     ++'
     ++
     ++test_expect_success PYTHON 'diff process startup failure only warns once' '
     ++	git -c diff.cdiff.process="/nonexistent/tool" \
     ++		diff -- one.c two.c >actual 2>stderr &&
     ++	# Both files produce diff output via fallback.
     ++	test_grep "return 10" actual &&
     ++	test_grep "return 20" actual &&
     ++	# Sentinel prevents repeated warnings: only one, not one per file.
     ++	test_grep "diff process.*failed" stderr >warnings &&
     ++	test_line_count = 1 warnings
     ++'
     ++
     ++test_expect_success PYTHON 'diff process fallback on bad hunks' '
     ++	git -c diff.cdiff.process="$BACKEND --mode=bad-hunk" \
     ++		diff boundary.c >actual 2>stderr &&
     ++	test_grep "^-OLD5" actual &&
     ++	test_grep "^+NEW5" actual &&
     ++	test_grep "^-OLD9" actual &&
     ++	test_grep "^+NEW9" actual &&
     ++	# Invalid hunks are caught by xdiff validation, not the
     ++	# protocol layer, so no warning is emitted.
     ++	test_must_be_empty stderr
     ++'
     ++
     ++test_expect_success PYTHON 'diff process fallback on mismatched unchanged totals' '
     ++	cat >synctest.c <<-\EOF &&
     ++	line1
     ++	line2
     ++	line3
      +	EOF
     -+	git add zerohunk.c &&
     -+	git commit -m "add zerohunk.c" &&
     ++	git add synctest.c &&
     ++	git commit -m "add synctest.c" &&
      +
     -+	cat >zerohunk.c <<-\EOF &&
     -+	int zero(void) { return 999; }
     ++	cat >synctest.c <<-\EOF &&
     ++	line1
     ++	changed
     ++	line3
      +	EOF
      +
     -+	git -c diff.cdiff.process="$BACKEND --mode=zero-hunk" \
     -+		diff zerohunk.c >actual &&
     -+	test_must_be_empty actual
     ++	# bad-sync reports hunk 1 2 1 1: marks 2 old lines and 1 new
     ++	# line as changed, leaving 1 unchanged old vs 2 unchanged new.
     ++	# The synchronization invariant fails and git falls back.
     ++	git -c diff.cdiff.process="$BACKEND --mode=bad-sync" \
     ++		diff synctest.c >actual 2>stderr &&
     ++	test_grep "changed" actual
     ++'
     ++
     ++test_expect_success PYTHON 'diff process fallback on overlapping hunks' '
     ++	# boundary.c has 10 lines, so both hunks are in bounds
     ++	# but they overlap at lines 3-5, triggering the ordering check.
     ++	git -c diff.cdiff.process="$BACKEND --mode=overlap" \
     ++		diff boundary.c >actual 2>stderr &&
     ++	test_grep "NEW5" actual
      +'
      +
      +test_done
     +
     + ## userdiff.h ##
     +@@
     + 
     + #include "notes-cache.h"
     + 
     ++struct diff_subprocess;
     + struct index_state;
     + struct repository;
     + 
     +@@ userdiff.h: struct userdiff_driver {
     + 	int textconv_want_cache;
     + 	const char *process;
     + 	char *process_owned;
     ++	struct diff_subprocess *diff_subprocess;
     ++	unsigned diff_process_failed : 1;
     + };
     + enum userdiff_driver_type {
     + 	USERDIFF_DRIVER_TYPE_BUILTIN = 1<<0,
 -:  ---------- > 5:  f4fd9aa682 diff: bypass diff process with --no-ext-diff and in format-patch
 4:  39ff53acef ! 6:  370e766978 blame: consult diff process for zero-hunk detection
     @@ Metadata
      Author: Michael Montalbo <mmontalbo@gmail.com>
      
       ## Commit message ##
     -    blame: consult diff process for zero-hunk detection
     +    blame: consult diff process for no-hunk detection
      
          When a diff process is configured via diff.<driver>.process,
          consult it during blame's per-commit diffing.  If the process
     -    returns zero hunks for a commit's changes to a file, treat the
     +    returns no hunks for a commit's changes to a file, treat the
          commit as having no changes, causing blame to attribute lines
          to earlier commits.
      
     +    The consultation happens at the pass_blame_to_parent() callsite
     +    using diff_process_fill_hunks(), matching how builtin_diff() in
     +    diff.c uses the same function.  A new diff_hunks_xpp() variant
     +    accepts a pre-populated xpparam_t for this callsite, while the
     +    existing diff_hunks() retains its original signature and behavior.
     +    The copy-detection callsite is unaffected since it does not use
     +    the diff process.
     +
          The subprocess is long-running (one startup cost amortized
          across the blame traversal), but each commit in the file's
          history incurs a round-trip to the tool.
      
          Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
      
     - ## Documentation/gitattributes.adoc ##
     -@@ Documentation/gitattributes.adoc: The tool responds with lines of the form
     - 
     - If the tool returns zero hunks with `status=success`, Git treats
     - the file as having no changes and produces no diff output.
     -+`git blame` also consults the diff process and skips commits
     -+where it reports zero hunks, attributing lines to earlier commits
     -+instead.
     - 
     - Tools should ignore unknown keys in the per-file request to
     - remain forward-compatible.
     -
       ## blame.c ##
      @@
       #include "tag.h"
       #include "trace2.h"
       #include "blame.h"
      +#include "diff-process.h"
     -+#include "userdiff.h"
     ++#include "xdiff-interface.h"
       #include "alloc.h"
       #include "commit-slab.h"
       #include "bloom.h"
      @@ blame.c: static struct commit *fake_working_tree_commit(struct repository *r,
       
       
     - static int diff_hunks(mmfile_t *file_a, mmfile_t *file_b,
     + 
     +-static int diff_hunks(mmfile_t *file_a, mmfile_t *file_b,
      -		      xdl_emit_hunk_consume_func_t hunk_func, void *cb_data, int xdl_opts)
     -+		      xdl_emit_hunk_consume_func_t hunk_func, void *cb_data,
     -+		      int xdl_opts, struct index_state *istate,
     -+		      const char *path)
     ++static int diff_hunks_xpp(mmfile_t *file_a, mmfile_t *file_b,
     ++			  xdl_emit_hunk_consume_func_t hunk_func,
     ++			  void *cb_data, xpparam_t *xpp)
       {
     - 	xpparam_t xpp = {0};
     +-	xpparam_t xpp = {0};
       	xdemitconf_t xecfg = {0};
       	xdemitcb_t ecb = {NULL};
     -+	struct xdl_hunk *ext_hunks = NULL;
     -+	int ret;
       
     - 	xpp.flags = xdl_opts;
     +-	xpp.flags = xdl_opts;
       	xecfg.hunk_func = hunk_func;
       	ecb.priv = cb_data;
      -	return xdi_diff(file_a, file_b, &xpp, &xecfg, &ecb);
     ++	return xdi_diff(file_a, file_b, xpp, &xecfg, &ecb);
     ++}
      +
     -+	if (path && istate) {
     -+		struct userdiff_driver *drv;
     -+		drv = userdiff_find_by_path(istate, path);
     -+		if (drv && drv->process) {
     -+			size_t nr = 0;
     -+			if (!diff_process_get_hunks(drv, path,
     -+						    file_a->ptr, file_a->size,
     -+						    file_b->ptr, file_b->size,
     -+						    &ext_hunks, &nr)) {
     -+				if (!nr) {
     -+					/*
     -+					 * Zero hunks: the diff process
     -+					 * considers these files equivalent.
     -+					 * Skip so blame looks past this
     -+					 * commit.
     -+					 */
     -+					return 0;
     -+				}
     -+				xpp.external_hunks = ext_hunks;
     -+				xpp.external_hunks_nr = nr;
     -+			}
     -+		}
     -+	}
     ++static int diff_hunks(mmfile_t *file_a, mmfile_t *file_b,
     ++		      xdl_emit_hunk_consume_func_t hunk_func, void *cb_data, int xdl_opts)
     ++{
     ++	xpparam_t xpp = {0};
      +
     -+	ret = xdi_diff(file_a, file_b, &xpp, &xecfg, &ecb);
     -+	free(ext_hunks);
     -+	return ret;
     ++	xpp.flags = xdl_opts;
     ++	return diff_hunks_xpp(file_a, file_b, hunk_func, cb_data, &xpp);
       }
       
       static const char *get_next_line(const char *start, const char *end)
     +@@ blame.c: static void pass_blame_to_parent(struct blame_scoreboard *sb,
     + 				 struct blame_origin *parent, int ignore_diffs)
     + {
     + 	mmfile_t file_p, file_o;
     ++	xpparam_t xpp = {0};
     + 	struct blame_chunk_cb_data d;
     + 	struct blame_entry *newdest = NULL;
     + 
      @@ blame.c: static void pass_blame_to_parent(struct blame_scoreboard *sb,
       			 &sb->num_read_blob, ignore_diffs);
       	sb->num_get_patch++;
       
      -	if (diff_hunks(&file_p, &file_o, blame_chunk_cb, &d, sb->xdl_opts))
     -+	if (diff_hunks(&file_p, &file_o, blame_chunk_cb, &d, sb->xdl_opts,
     -+		       sb->revs->diffopt.repo->index, target->path))
     - 		die("unable to generate diff (%s -> %s)",
     - 		    oid_to_hex(&parent->commit->object.oid),
     - 		    oid_to_hex(&target->commit->object.oid));
     -@@ blame.c: static void find_copy_in_blob(struct blame_scoreboard *sb,
     - 	 * file_p partially may match that image.
     - 	 */
     - 	memset(split, 0, sizeof(struct blame_entry [3]));
     --	if (diff_hunks(file_p, &file_o, handle_split_cb, &d, sb->xdl_opts))
     -+	if (diff_hunks(file_p, &file_o, handle_split_cb, &d, sb->xdl_opts,
     -+		       NULL, NULL))
     - 		die("unable to generate diff (%s)",
     - 		    oid_to_hex(&parent->commit->object.oid));
     - 	/* remainder, if any, all match the preimage */
     +-		die("unable to generate diff (%s -> %s)",
     +-		    oid_to_hex(&parent->commit->object.oid),
     +-		    oid_to_hex(&target->commit->object.oid));
     ++	xpp.flags = sb->xdl_opts;
     ++	/*
     ++	 * If the diff process considers the files equivalent,
     ++	 * skip the diff so blame looks past this commit.
     ++	 */
     ++	if (diff_process_fill_hunks(&sb->revs->diffopt, target->path,
     ++				    &file_p, &file_o, &xpp)
     ++	    != DIFF_PROCESS_EQUIVALENT) {
     ++		if (diff_hunks_xpp(&file_p, &file_o, blame_chunk_cb,
     ++				   &d, &xpp))
     ++			die("unable to generate diff (%s -> %s)",
     ++			    oid_to_hex(&parent->commit->object.oid),
     ++			    oid_to_hex(&target->commit->object.oid));
     ++	}
     ++	free(xpp.external_hunks);
     + 	/* The rest are the same as the parent */
     + 	blame_chunk(&d.dstq, &d.srcq, INT_MAX, d.offset, INT_MAX, 0,
     + 		    parent, target, 0);
      
       ## t/t4080-diff-process.sh ##
     -@@ t/t4080-diff-process.sh: test_expect_success PYTHON 'diff process zero hunks suppresses diff output' '
     - 	test_must_be_empty actual
     +@@ t/t4080-diff-process.sh: test_expect_success PYTHON 'diff process fallback on overlapping hunks' '
     + 	test_grep "NEW5" actual
       '
       
     -+test_expect_success PYTHON 'blame skips commits with zero hunks from diff process' '
     ++#
     ++# Blame integration.
     ++#
     ++
     ++test_expect_success PYTHON 'blame uses tool-provided hunks' '
     ++	cat >blame-hunk.c <<-\EOF &&
     ++	line1
     ++	line2
     ++	line3
     ++	line4
     ++	original5
     ++	original6
     ++	line7
     ++	line8
     ++	line9
     ++	line10
     ++	EOF
     ++	git add blame-hunk.c &&
     ++	git commit -m "add blame-hunk.c" &&
     ++	ORIG=$(git rev-parse --short HEAD) &&
     ++
     ++	cat >blame-hunk.c <<-\EOF &&
     ++	line1
     ++	line2
     ++	line3
     ++	line4
     ++	changed5
     ++	changed6
     ++	line7
     ++	line8
     ++	changed9
     ++	changed10
     ++	EOF
     ++	git add blame-hunk.c &&
     ++	git commit -m "change blame-hunk.c" &&
     ++	CHANGE=$(git rev-parse --short HEAD) &&
     ++
     ++	# With fixed-hunk mode the tool reports only lines 5-6 as changed,
     ++	# so blame should attribute lines 9-10 to the original commit
     ++	# even though the builtin diff would show them as changed.
     ++	git -c diff.cdiff.process="$BACKEND --mode=fixed-hunk" \
     ++		blame blame-hunk.c >actual &&
     ++	sed -n "9p" actual >line9 &&
     ++	sed -n "10p" actual >line10 &&
     ++	test_grep "$ORIG" line9 &&
     ++	test_grep "$ORIG" line10 &&
     ++	sed -n "5p" actual >line5 &&
     ++	sed -n "6p" actual >line6 &&
     ++	test_grep "$CHANGE" line5 &&
     ++	test_grep "$CHANGE" line6
     ++'
     ++
     ++test_expect_success PYTHON 'blame skips commits with no hunks from diff process' '
      +	cat >blame.c <<-\EOF &&
      +	int main(void)
      +	{
     @@ t/t4080-diff-process.sh: test_expect_success PYTHON 'diff process zero hunks sup
      +	EOF
      +	git add blame.c &&
      +	git commit -m "add blame.c" &&
     ++	ORIG_COMMIT=$(git rev-parse --short HEAD) &&
      +
      +	cat >blame.c <<-\EOF &&
      +	int main(void)
     @@ t/t4080-diff-process.sh: test_expect_success PYTHON 'diff process zero hunks sup
      +	git commit -m "reformat blame.c" &&
      +	BLAME_COMMIT=$(git rev-parse --short HEAD) &&
      +
     -+	# Without zero-hunk mode, blame attributes the change.
     ++	# Without no-hunks mode, blame attributes the change.
      +	git blame blame.c >without &&
      +	test_grep "$BLAME_COMMIT" without &&
      +
     -+	# With zero-hunk mode, the process considers the files equivalent
     -+	# and blame skips the reformat commit.
     -+	git -c diff.cdiff.process="$BACKEND --mode=zero-hunk" \
     ++	# With no-hunks mode, the process considers the files equivalent
     ++	# and blame skips the reformat commit, attributing to the original.
     ++	git -c diff.cdiff.process="$BACKEND --mode=no-hunks" \
      +		blame blame.c >with &&
     -+	! test_grep "$BLAME_COMMIT" with
     ++	test_grep ! "$BLAME_COMMIT" with &&
     ++	test_grep "$ORIG_COMMIT" with
      +'
      +
     ++test_expect_success PYTHON 'blame --no-ext-diff bypasses diff process' '
     ++	rm -f backend.log &&
     ++	git -c diff.cdiff.process="$BACKEND --mode=no-hunks --log=backend.log" \
     ++		blame --no-ext-diff blame.c >actual &&
     ++	# Without the process, blame attributes the reformat commit normally.
     ++	test_grep "$BLAME_COMMIT" actual &&
     ++	test_path_is_missing backend.log
     ++'
     ++
     ++test_expect_success PYTHON 'blame --no-ext-diff uses builtin hunks' '
     ++	# fixed-hunk mode would narrow blame to lines 5-6, but
     ++	# --no-ext-diff should bypass it and use the builtin diff.
     ++	rm -f backend.log &&
     ++	git -c diff.cdiff.process="$BACKEND --mode=fixed-hunk --log=backend.log" \
     ++		blame --no-ext-diff blame-hunk.c >actual &&
     ++	# Builtin diff attributes lines 9-10 to the change commit.
     ++	sed -n "9p" actual >line9 &&
     ++	test_grep "$CHANGE" line9 &&
     ++	test_path_is_missing backend.log
     ++'
      +
       test_done

-- 
gitgitgadget

^ permalink raw reply

* [PATCH v3 1/6] xdiff: support external hunks via xpparam_t
From: Michael Montalbo via GitGitGadget @ 2026-05-29 20:48 UTC (permalink / raw)
  To: git; +Cc: Michael Montalbo, Michael Montalbo
In-Reply-To: <pull.2120.v3.git.1780087700.gitgitgadget@gmail.com>

From: Michael Montalbo <mmontalbo@gmail.com>

Add two new xpparam_t fields (external_hunks, external_hunks_nr)
that let callers supply pre-computed hunks.  When set, xdl_diff()
populates the changed[] arrays from these hunks instead of running
the diff algorithm, then continues through compaction and emission
as usual.

Validate supplied hunks before use: reject out-of-bounds line
numbers, overlapping or out-of-order hunks, negative counts, and
violations of the synchronization invariant (unchanged line counts
must match between files).  On validation failure, fall back to
the builtin diff algorithm; this re-runs xdl_prepare_env() since
the first call may have dirtied the changed[] arrays.

Skip trim_common_tail() in xdi_diff() when external hunks are
present, since external hunks reference line numbers in the
original content.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
---
 xdiff-interface.c |  7 +++-
 xdiff/xdiff.h     | 13 ++++++++
 xdiff/xdiffi.c    | 85 +++++++++++++++++++++++++++++++++++++++++++++--
 xdiff/xprepare.c  | 10 ++++++
 xdiff/xprepare.h  |  1 +
 5 files changed, 113 insertions(+), 3 deletions(-)

diff --git a/xdiff-interface.c b/xdiff-interface.c
index f043330f2a..9542c0bcc2 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -124,7 +124,12 @@ int xdi_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, xdemitconf_t co
 	if (mf1->size > MAX_XDIFF_SIZE || mf2->size > MAX_XDIFF_SIZE)
 		return -1;
 
-	if (!xecfg->ctxlen && !(xecfg->flags & XDL_EMIT_FUNCCONTEXT))
+	/*
+	 * External hunks reference line numbers in the original content;
+	 * trimming the tail would change line counts and invalidate them.
+	 */
+	if (!xpp->external_hunks &&
+	    !xecfg->ctxlen && !(xecfg->flags & XDL_EMIT_FUNCCONTEXT))
 		trim_common_tail(&a, &b);
 
 	return xdl_diff(&a, &b, xpp, xecfg, xecb);
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index dc370712e9..a7e8502e4c 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -78,6 +78,15 @@ typedef struct s_mmbuffer {
 	long size;
 } mmbuffer_t;
 
+/*
+ * Hunk descriptor for externally computed diffs.
+ * Line numbers are 1-based, matching unified diff convention.
+ */
+struct xdl_hunk {
+	long old_start, old_count;
+	long new_start, new_count;
+};
+
 typedef struct s_xpparam {
 	unsigned long flags;
 
@@ -88,6 +97,10 @@ typedef struct s_xpparam {
 	/* See Documentation/diff-options.adoc. */
 	char **anchors;
 	size_t anchors_nr;
+
+	/* Externally computed hunks: bypass the diff algorithm.  Owned by caller. */
+	struct xdl_hunk *external_hunks;
+	size_t external_hunks_nr;
 } xpparam_t;
 
 typedef struct s_xdemitcb {
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 5455b4690d..b0a01ca856 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -1085,16 +1085,97 @@ static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe,
 	}
 }
 
+/*
+ * Populate the changed[] arrays from externally supplied hunks,
+ * bypassing the diff algorithm.  Validates that hunks are in order,
+ * non-overlapping, and within bounds.
+ *
+ * Returns 0 on success, -1 on validation failure.
+ */
+static int xdl_populate_hunks_from_external(xdfenv_t *xe,
+					    struct xdl_hunk *hunks,
+					    size_t nr_hunks)
+{
+	size_t i;
+	long j, prev_old_end = 0, prev_new_end = 0;
+	long total_old = 0, total_new = 0;
+
+	/*
+	 * xdl_prepare_env() may dirty changed[] via xdl_cleanup_records().
+	 * Clear them so only the external hunks are marked.
+	 */
+	xdl_clear_changed(&xe->xdf1);
+	xdl_clear_changed(&xe->xdf2);
+
+	for (i = 0; i < nr_hunks; i++) {
+		struct xdl_hunk *h = &hunks[i];
+
+		if (h->old_count < 0 || h->new_count < 0)
+			return -1;
+		if (h->old_start < 1 || h->new_start < 1)
+			return -1;
+
+		/*
+		 * Range must fit: start + count - 1 <= nrec,
+		 * rewritten to avoid overflow.  Same for both sides.
+		 *
+		 * When count is 0 (pure insert/delete) the check
+		 * reduces to 0 > nrec - start + 1, which rejects
+		 * start > nrec + 1 and allows start == nrec + 1
+		 * (the position after the last line).
+		 */
+		if (h->old_count > (long)xe->xdf1.nrec - h->old_start + 1)
+			return -1;
+		if (h->new_count > (long)xe->xdf2.nrec - h->new_start + 1)
+			return -1;
+
+		/* Ordering: no overlap with previous hunk (adjacent is OK) */
+		if (h->old_start < prev_old_end ||
+		    h->new_start < prev_new_end)
+			return -1;
+
+		for (j = 0; j < h->old_count; j++)
+			xe->xdf1.changed[h->old_start - 1 + j] = true;
+		for (j = 0; j < h->new_count; j++)
+			xe->xdf2.changed[h->new_start - 1 + j] = true;
+
+		prev_old_end = h->old_start + h->old_count;
+		prev_new_end = h->new_start + h->new_count;
+		total_old += h->old_count;
+		total_new += h->new_count;
+	}
+
+	/*
+	 * Synchronization invariant: unchanged line counts must match.
+	 * Otherwise xdl_build_script() would walk off one array.
+	 */
+	if ((long)xe->xdf1.nrec - total_old !=
+	    (long)xe->xdf2.nrec - total_new)
+		return -1;
+
+	return 0;
+}
+
 int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 	     xdemitconf_t const *xecfg, xdemitcb_t *ecb) {
 	xdchange_t *xscr;
 	xdfenv_t xe;
 	emit_func_t ef = xecfg->hunk_func ? xdl_call_hunk_func : xdl_emit_diff;
 
-	if (xdl_do_diff(mf1, mf2, xpp, &xe) < 0) {
+	if (xpp->external_hunks) {
+		if (xdl_prepare_env(mf1, mf2, xpp, &xe) < 0)
+			return -1;
+		if (xdl_populate_hunks_from_external(&xe,
+						     xpp->external_hunks,
+						     xpp->external_hunks_nr) == 0)
+			goto diff_done;
+		xdl_free_env(&xe);
+	}
 
+	if (xdl_do_diff(mf1, mf2, xpp, &xe) < 0)
 		return -1;
-	}
+
+diff_done:
 	if (xdl_change_compact(&xe.xdf1, &xe.xdf2, xpp->flags) < 0 ||
 	    xdl_change_compact(&xe.xdf2, &xe.xdf1, xpp->flags) < 0 ||
 	    xdl_build_script(&xe, &xscr) < 0) {
diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
index cd4fc405eb..4645a9a746 100644
--- a/xdiff/xprepare.c
+++ b/xdiff/xprepare.c
@@ -432,3 +432,13 @@ int xdl_prepare_env(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 
 	return 0;
 }
+
+/*
+ * Reset the changed[] array so that no lines are marked as changed.
+ * Also clears the sentinel slots at changed[-1] and changed[nrec]
+ * that xdl_change_compact() relies on during backward scans.
+ */
+void xdl_clear_changed(xdfile_t *xdf)
+{
+	memset(xdf->changed - 1, 0, (xdf->nrec + 2) * sizeof(bool));
+}
diff --git a/xdiff/xprepare.h b/xdiff/xprepare.h
index 947d9fc1bb..0413baf07b 100644
--- a/xdiff/xprepare.h
+++ b/xdiff/xprepare.h
@@ -28,6 +28,7 @@
 int xdl_prepare_env(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 		    xdfenv_t *xe);
 void xdl_free_env(xdfenv_t *xe);
+void xdl_clear_changed(xdfile_t *xdf);
 
 
 
-- 
gitgitgadget


^ permalink raw reply related

* Re: [PATCH 0/3] pack-objects: support bitmaps and delta-islands with `--path-walk`
From: Taylor Blau @ 2026-05-29 20:07 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Junio C Hamano, Jeff King, Elijah Newren
In-Reply-To: <22a7e32f-f645-4f00-bc5b-6b4309e483c2@gmail.com>

On Fri, May 29, 2026 at 01:26:33PM -0400, Derrick Stolee wrote:
> On 5/28/26 11:28 AM, Derrick Stolee wrote:
> > On 5/27/26 7:18 PM, Taylor Blau wrote:
>
> > Do you have any end-to-end performance data to demonstrate that these
> > changes are effective at scale? Are we still producing packfiles with the
> > pack-file compression and now with .bitmap files? How does this impact
> > the performance of a clone or fetch when using a bitmap index at read
> > time?
>
> Here's my attempt to use our existing performance tests to analyze the
> impact of this series.
>
> Running p5311 against the base of this topic and this topic with
> GIT_TEST_PACK_PATH_WALK=1, I get this output:

Yikes. That's not great, but see below for what I think is going on.

(As an aside, we can focus in on either lookup=true or lookup=false,
since these are just controlling whether or not the bitmap lookup table
is written. On a repository as small as git.git, this shouldn't make a
huge difference either way. I have a separate series to make this the
default and to clean up the t/perf suite accordingly, but haven't sent
it to the list yet.)

> Test                                     HEAD~3    HEAD
> -----------------------------------------------------------------
> [...]

I think this is either the primary reason why you're not seeing an
improvement here, or at least related to it...

> Do you have a good feeling for why the path-walk feature doesn't
> make a huge change in these test scenarios?

I think the problem is that we're relying on the TEST_ variable to tell
pack-objects to generate a pack using --path-walk, but treat it as a
fallback.

I suspect that since p5311 invokes repack *without* the '--path-walk'
option, we end up in this case within cmd_pack_objects():

	if (path_walk < 0) {
		if (use_bitmap_index > 0 ||
		    !use_internal_rev_list)
			path_walk = 0;
		else if (the_repository->gitdir &&
			 the_repository->settings.pack_use_path_walk)
			path_walk = 1;
		else
			path_walk = git_env_bool("GIT_TEST_PACK_PATH_WALK", 0);
	}

, where `path_walk` is *not* set, but `use_bitmap_index` is since p5311
set the 'pack.writeBitmaps' configuration (as a separate aside, this
should really prefer the non-deprecated 'repack.writeBitmaps' variant).

So in that case, we fall back to `path_walk = 0`, and don't even bother
reading the `GIT_TEST_PACK_PATH_WALK` variable.

If I modify the perf test like so:

--- 8< ---
diff --git a/t/perf/p5311-pack-bitmaps-fetch.sh b/t/perf/p5311-pack-bitmaps-fetch.sh
index 047efb995d6..1c9c99216e3 100755
--- a/t/perf/p5311-pack-bitmaps-fetch.sh
+++ b/t/perf/p5311-pack-bitmaps-fetch.sh
@@ -13,7 +13,7 @@ test_fetch_bitmaps () {
 	test_expect_success 'create bitmapped server repo' '
 		git config pack.writebitmaps true &&
 		git config pack.writeBitmapLookupTable '"$1"' &&
-		git repack -ad
+		git repack -ad --path-walk
 	'

 	# simulate a fetch from a repository that last fetched N days ago, for
--- >8 ---

, then I can get significantly improved results when running without the
GIT_TEST_PACK_PATH_WALK variablle (here I'm truncating the
'lookup=false' case, which performs nearly identically):

    Test                                       HEAD~3            HEAD
    ------------------------------------------------------------------------------------
    5311.4: server   (1 days) (lookup=true)    2.57(2.52+0.04)   0.03(0.02+0.00) -98.8%
    5311.5: size     (1 days)                           153.4K            153.4K +0.0%
    5311.6: client   (1 days) (lookup=true)    0.00(0.01+0.00)   0.00(0.01+0.00) =
    5311.8: server   (2 days) (lookup=true)    2.60(2.55+0.04)   0.02(0.02+0.00) -99.2%
    5311.9: size     (2 days)                           153.4K            153.4K +0.0%
    5311.10: client   (2 days) (lookup=true)   0.00(0.01+0.00)   0.00(0.01+0.00) =
    5311.12: server   (4 days) (lookup=true)   2.60(2.54+0.05)   0.03(0.03+0.00) -98.8%
    5311.13: size     (4 days)                          209.0K            209.0K +0.0%
    5311.14: client   (4 days) (lookup=true)   0.01(0.02+0.00)   0.01(0.01+0.00) +0.0%
    5311.16: server   (8 days) (lookup=true)   2.58(2.53+0.04)   0.03(0.03+0.00) -98.8%
    5311.17: size     (8 days)                          209.0K            209.0K +0.0%
    5311.18: client   (8 days) (lookup=true)   0.01(0.01+0.00)   0.01(0.02+0.00) +0.0%
    5311.20: server  (16 days) (lookup=true)   2.58(2.52+0.05)   0.03(0.03+0.00) -98.8%
    5311.21: size    (16 days)                          209.0K            209.0K +0.0%
    5311.22: client  (16 days) (lookup=true)   0.01(0.02+0.00)   0.01(0.01+0.00) +0.0%
    5311.24: server  (32 days) (lookup=true)   2.61(2.58+0.03)   0.03(0.02+0.01) -98.9%
    5311.25: size    (32 days)                          212.9K            212.9K +0.0%
    5311.26: client  (32 days) (lookup=true)   0.01(0.02+0.00)   0.02(0.02+0.00) +100.0%
    5311.28: server  (64 days) (lookup=true)   2.72(2.79+0.06)   0.19(0.30+0.03) -93.0%
    5311.29: size    (64 days)                            4.5M              4.5M -0.0%
    5311.30: client  (64 days) (lookup=true)   0.49(0.58+0.02)   0.48(0.56+0.04) -2.0%
    5311.32: server (128 days) (lookup=true)   2.90(3.21+0.09)   0.35(0.70+0.04) -87.9%
    5311.33: size   (128 days)                            9.4M              9.5M +0.4%
    5311.34: client (128 days) (lookup=true)   0.98(1.27+0.08)   0.98(1.33+0.06) +0.0%

My reading here is that we get significantly smaller packs (i.e. all
'test_size' tests drop from HEAD~3 to HEAD) in the same amount of time
(i.e. that all 'test_perf' tests are roughly flat).

That lines up with my expectation here, which is that even though we're
using bitmaps at read time, that's effectively seeding the verbatim pack
reuse over a significantly smaller pack, producing a much smaller output
pack as a result.

As to whether we should modify the perf suite to test this, naturally I
think we should. Likely that looks like modifying p5313 to re-run
`test_all_with_args` with '--use-bitmap-index' after repacking with
--path-walk and generating bitmaps like so (untested):

--- 8< ---
diff --git a/t/perf/p5313-pack-objects.sh b/t/perf/p5313-pack-objects.sh
index 46a6cd32d24..663717982b1 100755
--- a/t/perf/p5313-pack-objects.sh
+++ b/t/perf/p5313-pack-objects.sh
@@ -22,6 +22,21 @@ test_expect_success 'create rev input' '
 	EOF
 '

+test_repack_with_args () {
+	args="$@"
+	export args
+
+	test_perf "repack with $args" '
+		git repack -adf $args
+	'
+
+	test_size "repack size with $args" '
+		gitdir=$(git rev-parse --git-dir) &&
+		pack=$(ls $gitdir/objects/pack/pack-*.pack) &&
+		test_file_size "$pack"
+	'
+}
+
 test_all_with_args () {
 	parameter=$1
 	export parameter
@@ -52,23 +67,22 @@ test_all_with_args () {
 	test_size "shallow pack size with $parameter" '
 		test_file_size out
 	'
-
-	test_perf "repack with $parameter" '
-		git repack -adf $parameter
-	'
-
-	test_size "repack size with $parameter" '
-		gitdir=$(git rev-parse --git-dir) &&
-		pack=$(ls $gitdir/objects/pack/pack-*.pack) &&
-		test_file_size "$pack"
-	'
 }

 for version in 1 2
 do
-	test_all_with_args --name-hash-version=$version
+	arg="--name-hash-version=$version" &&
+
+	test_all_with_args "$arg" &&
+	test_repack_with_args "$arg" || return 1
 done

 test_all_with_args --path-walk
+test_repack_with_args --path-walk
+
+# inverted order here: we want to test using reachability bitmaps on a
+# pack written with --path-walk
+test_repack_with_args --path-walk --write-bitmap-index
+test_all_with_args --use-bitmap-index

 test_done
--- >8 ---

I don't have a strong opinion on whether or not we should include that
in this series or elsewhere.

Thanks,
Taylor

^ permalink raw reply related

* Re: git hook question
From: Jeff King @ 2026-05-29 19:23 UTC (permalink / raw)
  To: Wesley Schwengle; +Cc: Git maillinglist
In-Reply-To: <c5527d8c-9147-4355-a07d-153d3977108e@opperschaap.net>

On Fri, May 29, 2026 at 12:11:59PM -0400, Wesley Schwengle wrote:

> >    git config hook.npm-test.command 'npm run test #'
> > 
> > Git will paste together the shell command:
> > 
> >    npm run test # "$@"
> 
> That doesn't work on my side:
> 
> $ cat ~/.config/git/js.config && git config --get hook.npm-test.command &&
> GIT_TRACE=1 git poh
> [hook "npm-test"]
>   event = pre-push
>   command = npm run test #
>   enabled = true

The "#" is being eaten by the config parser as a comment, so the value
is effectively the same as what you originally had. As you noticed,
putting it in double-quotes fixes that, though I'd probably do the whole
thing for readability like:

  command = "npm run test #"

Which is also what "git config" would write with the command I showed
above.

> Also seems to fail:
> 
> [hook "npm-test"]
>   event = pre-push
>   command = git npm-test
>   enabled = true
> 
> [alias]
>   npm-test = !f() { npm run test; }; f

This is also a config quoting problem. Both "#" and semicolon begin
comments. Putting the whole thing in double-quotes works.

> The following circles back a little to the first response.
> 
> Tt kind of diverges from `git hook run pre-push' and how additional
> arguments are given on the command line with that invocation. Wrappers need
> to become aware on way it is called, either via hook or via a manual way,
> because of the `remote url' that gets added.

I don't think the hooks themselves should need to be aware. If somebody
is calling "git hook run pre-push" without providing arguments, they are
breaking the contract to the hooks. You can get away with it if you know
your particular hooks do not care about those arguments, but in the
general case, what should a pre-push hook that _does_ care about the
remote name do when it doesn't get any arguments? It's an error.

I guess there's a more fundamental question: why are you running "git
hook" in the first place? If it is just to test out your hooks, that's
fine. But to make the test more realistic, you may want to give it
arguments (and stdin input) to match the specific hook you're testing.

> Normal hooks get that info via their STDIN, wouldn't this also make sense
> for these type of hooks? It makes differentiation much easier.

Usually we pass fixed-size information via arguments, and
arbitrary-sized information over stdin. In pre-push you have the joy of
dealing with both. So your "npm run test" hook is also going to have its
stdin hooked up to a pipe with the ref updates sent over it. That might
be OK if it never reads from stdin, but it may also cause some surprises
depending on what the "test" target runs under the hood.

So whether you are getting input as arguments or over stdin, it's
probably something the hook needs to deal with (or at least think
about).

-Peff

^ permalink raw reply

* Re: [PATCH] commit-reach: stop sorting in paint_down_to_common()
From: Jeff King @ 2026-05-29 19:04 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git List
In-Reply-To: <107314f5-0057-4ed3-9bee-9dca4f424bd1@web.de>

On Fri, May 29, 2026 at 05:32:58PM +0200, René Scharfe wrote:

> > Seems like an easy win. If some of the callers do not even look at the
> > result, could we avoid building it at all in those cases (e.g., by
> > passing in a NULL result pointer)?
> 
> Yes, at the cost of adding NULL checks to paint_down_to_common().  Which
> is probably worth it.

Yeah. I thought it was only one line (where we append to "tail"), but
there are a couple spots where "result" is referenced directly.

Another fun fact: it looks like paint_down_to_common() makes sure to
clean up the output list before returning an error. But the callers do
so, too, which is redundant. That would go away if they just passed in
NULL. :)

-Peff

^ permalink raw reply

* Re: Suggetsions for collaboration workflows in large repos
From: Matthew Hughes @ 2026-05-29 18:06 UTC (permalink / raw)
  To: git
In-Reply-To: <20260529163117.z2auhbg4sdxxgmis@archP14s>

On Fri, May 29, 2026 at 05:31:17PM +0100, Matthew Hughes wrote:
> I thought about doing something like tracking
> `refs/heads*/some-colleague-branch` from the remote, since with the wildcard
> `*` I at least won't the fatal error on the missing reference during fetch, but
> that risks my config containing an ever growing list of such wildcards, or a
> bunch of manual work occasionally cleaning up old ones (or maybe that could be
> automated).

I hacked some scripts to automate this. Firstly, one for fetching:

1. Fetches the branch
2. Adds a fetch config with wildcard hacks so `git fetch` brings in updates for
  that branch (the refspec should match _exactly_ that branch and never
  anything more)
3. Adds a separate ref to record that we're tracking this branch (so something
  knows to clean it up later)

    #!/usr/bin/env bash

    set -o errexit -o pipefail -o nounset

    # save command as e.g. git-fetch-other
    CMD_NAME="$(basename "$0" | sed 's/git-//g')"
    if [ $# -lt 1 ]
    then
        echo "usage: git $CMD_NAME branch-name [ remote-name ]" >&2
        exit 1
    fi

    BRANCH_NAME="$1"
    REMOTE_NAME="${2:-origin}"
    FETCH_CONFIG_NAME="remote.$REMOTE_NAME.fetch"

    git fetch "$REMOTE_NAME" "$BRANCH_NAME"
    git checkout -b "$BRANCH_NAME"

    # we want to record that we are tracking this branch, to do this create
    # a new ref whose name tells us what we're tracking, but whose value is
    # unimportant. So as a placeholder value just use the hash of an empty tree
    # taken from https://git.kernel.org/pub/scm/git/git.git/commit/?id=9c8a294a1ae1335511475db9c0eb8841c0ec9738
    EMPTY_TREE_REF="$(git hash-object -t tree /dev/null)"

    # refspec used to track the branch: we expect branches to be deleted from the
    # upstream when merged so tracking exactly:
    # "+refs/heads/$BRANCH_NAME:refs/remotes/$REMOTE_NAME/$BRANCH_NAME" will error
    # when we go to fetch that exact ref after its removed upstream.
    # so HACK around this: add wildcards that we still expect to only ever match
    # this exact branch (but doesn't have the issue of git complaining when it
    # tries to fetch an _exact_ ref)
    TRACKING_REFSPEC="+refs/heads*/$BRANCH_NAME:refs/remotes*/$REMOTE_NAME/$BRANCH_NAME"

    # record that we're tracking this branch. First check we've not already
    # recorded this, then ...
    if ! git config get --local --fixed-value --value "$TRACKING_REFSPEC" "$FETCH_CONFIG_NAME" >/dev/null
    then
        # ... set the config to track it for fetching, and ...
        git config set --comment "$CMD_NAME: tracking at $(date -I)"  --local --append "$FETCH_CONFIG_NAME" "$TRACKING_REFSPEC"
        # ... record that we have special cased this tracking
        git update-ref "refs/tracked/$REMOTE_NAME/$BRANCH_NAME" "$EMPTY_TREE_REF"
    fi

And the cleanup script (needs to be run periodically):

1. Collects all the remote branches we know about
2. Checks all the references from step 3. above and checks if any branches
defined there are missing remotes (I have fetch.prune=true to keep the remote
tracking references up-to-date)
3. If they are, drops the tracking config for that branch

    #!/usr/bin/env bash

    set -o errexit -o pipefail -o nounset

    REMOTE_NAME="${1:-origin}"
    TRACKED_REF_PREFIX="refs/tracked/$REMOTE_NAME"
    REMOTE_REF_PREFIX="refs/remotes/$REMOTE_NAME"

    declare -A remote_branch_lookup
    while read -r remote_ref
    do
        # strip prefix, e.g. 'refs/remotes/origin/some-branch' -> 'some-branch'
        branch_name="${remote_ref#$REMOTE_REF_PREFIX/}"
        remote_branch_lookup["$branch_name"]=1
    done < <(git for-each-ref --format='%(refname)' "$REMOTE_REF_PREFIX/")

    while read -r tracking_info
    do
        tracked_branch="${tracking_info#$TRACKED_REF_PREFIX/}"
        if ! [[ -v "remote_branch_lookup[$tracked_branch]" ]]
        then
            echo "branch $tracked_branch has been removed from the remote, untracking it"
            git update-ref -d "$TRACKED_REF_PREFIX/$tracked_branch"

            tracking_refspec="+refs/heads*/$tracked_branch:refs/remotes*/$REMOTE_NAME/$tracked_branch"
            git config unset --local --fixed-value --value "$tracking_refspec" "remote.$REMOTE_NAME.fetch"
        fi
    done < <(git for-each-ref --format='%(refname)' "$TRACKED_REF_PREFIX/")

So functionally I think this allows for the workflow I want, but does feel like
a big ol' hack :>


^ permalink raw reply

* Re: Suggetsions for collaboration workflows in large repos
From: Ben Knoble @ 2026-05-29 17:56 UTC (permalink / raw)
  To: Matthew Hughes; +Cc: git
In-Reply-To: <20260529163117.z2auhbg4sdxxgmis@archP14s>


> Le 29 mai 2026 à 12:47, Matthew Hughes <matthewhughes934@gmail.com> a écrit :
> 
> Hi,
> 
> I'm looking for some git workflow suggestions to help cut down on unnecessary
> fetching when working in a large repo with many (hundreds) of other devs and
> thousands of branches. Specifically, if in this repo I use the common config to
> just fetch all the remote heads:
> 
>    $ git config set remote.origin.fetch '+refs/heads/*:refs/remotes/origin/*'
> 
> Then I find I get a lot of noise from the all the branches being
> created/updated/deleted as well as an increase in the size of my local repo due
> to all the objects I need to fetch across all those branches.
> 
> To clarify the general performance of git in this repo is reasonable (shoutout
> to `scalar`) but I am interested in cutting down on this fetching since when
> working in this repo I'm generally only interested in a tiny subset of all
> branches:
> 
> 1. The `main` branch (that everyone merges into)
> 2. Any of _my_ branches
> 3. Occasionally, one of my colleagues branches, so e.g. I can check out their
>   code locally to review (most reviewing I do in the web UI, this is
>   GitHub)
> 
> I have a prefix for all my branches: `mhughes-`, so to sort out just the
> first two points I can configure git to fetch `main` and references with that
> prefix:
> 
>    $ git config set --comment 'fetch main' remote.origin.fetch '+refs/heads/main:refs/remotes/origin/main'
>    $ git config set --append --comment 'fetch my branches' remote.origin.fetch '+refs/heads/mhughes-*:refs/remotes/origin/mhughes-*'
> 
> But then when I do want to check out a colleague's branch I need to explicitly
> fetch the exact ref like:
> 
>    $ git fetch origin some-colleague-branch
>    $ git checkout FETCH_HEAD -b some-colleague-branch
> 
> Which is ok (it's my current workflow), but it means I have to re-fetch the
> exact ref if I want to bring in changes that they make after my initial fetch
> 
> I could add an explicit fetch of their branch like:
> 
>    $ git config set --append remote.origin.fetch '+refs/heads/some-colleague-branch:refs/remotes/origin/some-colleague-branch'
> 
> So that each `git fetch` also brings in updates to that branch, but in the
> remote we delete branches once their changes are merged, so if I leave that
> config I'll eventually (once they merge their change and delete the branch) run
> into errors when fetching like:
> 
>    fatal: couldn't find remote ref refs/heads/some-colleague-branch
> 
> Does anyone have suggestions to make this smoother? Or alternative workflows
> for achieving this goal? I'd also be curious to hear about other approaches
> people take went working in large repos with lots of other collaborators.
> Or am I just using git wrong in a repo like this, and should adopt another
> approach?
> 
> I thought about doing something like tracking
> `refs/heads*/some-colleague-branch` from the remote, since with the wildcard
> `*` I at least won't the fatal error on the missing reference during fetch, but
> that risks my config containing an ever growing list of such wildcards, or a
> bunch of manual work occasionally cleaning up old ones (or maybe that could be
> automated).
> 
> Thanks,
> Matt

My current advice is to enable git-maintenance on such a repo, where prefetches and commit graphs and so on will give you a nice perf boost. Then I keep the default fetch all heads config and don’t mind the noise too much.

^ permalink raw reply

* Re: git hook question
From: Ben Knoble @ 2026-05-29 17:52 UTC (permalink / raw)
  To: Wesley Schwengle; +Cc: Jeff King, Git maillinglist
In-Reply-To: <c5527d8c-9147-4355-a07d-153d3977108e@opperschaap.net>


> Le 29 mai 2026 à 12:17, Wesley Schwengle <wesleys@opperschaap.net> a écrit :
> 
> On 5/29/26 01:21, Jeff King wrote:
>>> On Fri, May 29, 2026 at 01:01:34AM -0400, Wesley Schwengle wrote:
>>> I understand the why, normally pre-push gets `<local-ref> SP
>>> <local-object-name> SP <remote-ref> SP <remote-object-name> LF'. This has a
>>> similar feel, albeit a different syntax. The difference feels like a minor
>>> bug, but not one I'm worried about at this moment: you would expect it to
>>> get the same arguments/parameters as the regular pre-push hook. But I
>>> digress.
>> I think the "git hook" command is mostly intended for scripting, and the
>> caller is expected to understand the context and provide the appropriate
>> arguments. The hook command itself doesn't know about what a "pre-push"
>> hook should look like.
>> So not a bug, but definitely a gotcha that could perhaps be better
>> explained in the documentation.
> 
> I think the "normal" pre-push makes more sense than the one I'm seeing right now, but perhaps that's me. But I think that the docs would perhaps need an update to why this `remote url' are the arguments. Especially if you read `githooks(5)' it seems a little strange.

The hooks manual says that the pre-push hook is invoked with 2 parameters (name and location of the destination). 

^ permalink raw reply

* Re: [PATCH 0/3] pack-objects: support bitmaps and delta-islands with `--path-walk`
From: Derrick Stolee @ 2026-05-29 17:26 UTC (permalink / raw)
  To: Taylor Blau, git; +Cc: Junio C Hamano, Jeff King, Elijah Newren
In-Reply-To: <a708e23d-e0c2-48c9-86e9-1227f12edd53@gmail.com>

On 5/28/26 11:28 AM, Derrick Stolee wrote:
> On 5/27/26 7:18 PM, Taylor Blau wrote:

> Do you have any end-to-end performance data to demonstrate that these
> changes are effective at scale? Are we still producing packfiles with the
> pack-file compression and now with .bitmap files? How does this impact
> the performance of a clone or fetch when using a bitmap index at read
> time?

Here's my attempt to use our existing performance tests to analyze the
impact of this series.

Running p5311 against the base of this topic and this topic with
GIT_TEST_PACK_PATH_WALK=1, I get this output:

Test                                     HEAD~3    HEAD
-----------------------------------------------------------------
5311.4: server   (1 days) (lookup=true)     0.02   0.03 +50.0%
5311.5: size     (1 days)                   6.8K 124.9K +1730.9%
5311.6: client   (1 days) (lookup=true)     0.02   0.01 -50.0%
5311.8: server   (2 days) (lookup=true)     0.02   0.03 +50.0%
5311.9: size     (2 days)                   6.8K 124.9K +1730.9%
5311.10: client   (2 days) (lookup=true)    0.02   0.01 -50.0%
5311.12: server   (4 days) (lookup=true)    0.02   0.03 +50.0%
5311.13: size     (4 days)                  6.8K 124.9K +1730.9%
5311.14: client   (4 days) (lookup=true)    0.02   0.01 -50.0%
5311.16: server   (8 days) (lookup=true)    0.03   0.03 +0.0%
5311.17: size     (8 days)                 37.3K 186.0K +398.2%
5311.18: client   (8 days) (lookup=true)    0.03   0.02 -33.3%
5311.20: server  (16 days) (lookup=true)    0.02   0.03 +50.0%
5311.21: size    (16 days)                 37.3K 186.0K +398.2%
5311.22: client  (16 days) (lookup=true)    0.03   0.02 -33.3%
5311.24: server  (32 days) (lookup=true)    0.03   0.03 +0.0%
5311.25: size    (32 days)                 46.5K 197.2K +324.3%
5311.26: client  (32 days) (lookup=true)    0.03   0.02 -33.3%
5311.28: server  (64 days) (lookup=true)    0.24   0.16 -33.3%
5311.29: size    (64 days)                  1.5M   5.1M +239.8%
5311.30: client  (64 days) (lookup=true)    0.42   0.35 -16.7%
5311.32: server (128 days) (lookup=true)    0.49   0.29 -40.8%
5311.33: size   (128 days)                  4.1M   9.8M +139.5%
5311.34: client (128 days) (lookup=true)    0.86   0.65 -24.4%
5311.38: server   (1 days) (lookup=false)   0.02   0.03 +50.0%
5311.39: size     (1 days)                  6.8K 124.9K +1730.9%
5311.40: client   (1 days) (lookup=false)   0.02   0.02 +0.0%
5311.42: server   (2 days) (lookup=false)   0.02   0.03 +50.0%
5311.43: size     (2 days)                  6.8K 124.9K +1730.9%
5311.44: client   (2 days) (lookup=false)   0.02   0.02 +0.0%
5311.46: server   (4 days) (lookup=false)   0.02   0.03 +50.0%
5311.47: size     (4 days)                  6.8K 124.9K +1730.9%
5311.48: client   (4 days) (lookup=false)   0.02   0.02 +0.0%
5311.50: server   (8 days) (lookup=false)   0.02   0.03 +50.0%
5311.51: size     (8 days)                 37.3K 186.0K +398.2%
5311.52: client   (8 days) (lookup=false)   0.03   0.02 -33.3%
5311.54: server  (16 days) (lookup=false)   0.02   0.03 +50.0%
5311.55: size    (16 days)                 37.3K 186.0K +398.2%
5311.56: client  (16 days) (lookup=false)   0.03   0.02 -33.3%
5311.58: server  (32 days) (lookup=false)   0.03   0.03 +0.0%
5311.59: size    (32 days)                 46.5K 197.2K +324.3%
5311.60: client  (32 days) (lookup=false)   0.03   0.02 -33.3%
5311.62: server  (64 days) (lookup=false)   0.25   0.17 -32.0%
5311.63: size    (64 days)                  1.5M   5.1M +239.8%
5311.64: client  (64 days) (lookup=false)   0.43   0.37 -14.0%
5311.66: server (128 days) (lookup=false)   0.50   0.29 -42.0%
5311.67: size   (128 days)                  4.1M   9.8M +138.6%
5311.68: client (128 days) (lookup=false)   0.87   0.67 -23.0%

It's important to realize that even with the test variable, the
path-walk logic is overriding the bitmap logic in the HEAD~3
case.

What's happening is that the path-walk mode (without bitmaps)
is computing a smaller packfile for all of these cases. Some
are significantly smaller, but only when it's a very small
pack anyway. The bitmap case is faster only for larger fetches.

I did the same test without the path-walk feature and both columns
looked the same (as expected, no change due to this series) and
the data matched the path-walk test's HEAD column pretty closely.
So this shows that adding path-walk to bitmap-focused efforts is
not a regression on any of these dimensions.

This test was for my local copy of the Git repository, including
all the forks I fetch. I hoped the results would be different
for repositories that have data shapes that struggle with
name-hash collisions, but microsoft/fluentui is an example that
I've used for path-walk repacks before and it had similar data.

Do you have a good feeling for why the path-walk feature doesn't
make a huge change in these test scenarios?

Thanks,
-Stolee


^ permalink raw reply

* Suggetsions for collaboration workflows in large repos
From: Matthew Hughes @ 2026-05-29 16:31 UTC (permalink / raw)
  To: git

Hi,

I'm looking for some git workflow suggestions to help cut down on unnecessary
fetching when working in a large repo with many (hundreds) of other devs and
thousands of branches. Specifically, if in this repo I use the common config to
just fetch all the remote heads:

    $ git config set remote.origin.fetch '+refs/heads/*:refs/remotes/origin/*'

Then I find I get a lot of noise from the all the branches being
created/updated/deleted as well as an increase in the size of my local repo due
to all the objects I need to fetch across all those branches.

To clarify the general performance of git in this repo is reasonable (shoutout
to `scalar`) but I am interested in cutting down on this fetching since when
working in this repo I'm generally only interested in a tiny subset of all
branches:

1. The `main` branch (that everyone merges into)
2. Any of _my_ branches
3. Occasionally, one of my colleagues branches, so e.g. I can check out their
   code locally to review (most reviewing I do in the web UI, this is
   GitHub)

I have a prefix for all my branches: `mhughes-`, so to sort out just the
first two points I can configure git to fetch `main` and references with that
prefix:

    $ git config set --comment 'fetch main' remote.origin.fetch '+refs/heads/main:refs/remotes/origin/main'
    $ git config set --append --comment 'fetch my branches' remote.origin.fetch '+refs/heads/mhughes-*:refs/remotes/origin/mhughes-*'

But then when I do want to check out a colleague's branch I need to explicitly
fetch the exact ref like:

    $ git fetch origin some-colleague-branch
    $ git checkout FETCH_HEAD -b some-colleague-branch

Which is ok (it's my current workflow), but it means I have to re-fetch the
exact ref if I want to bring in changes that they make after my initial fetch

I could add an explicit fetch of their branch like:

    $ git config set --append remote.origin.fetch '+refs/heads/some-colleague-branch:refs/remotes/origin/some-colleague-branch'

So that each `git fetch` also brings in updates to that branch, but in the
remote we delete branches once their changes are merged, so if I leave that
config I'll eventually (once they merge their change and delete the branch) run
into errors when fetching like:

    fatal: couldn't find remote ref refs/heads/some-colleague-branch

Does anyone have suggestions to make this smoother? Or alternative workflows
for achieving this goal? I'd also be curious to hear about other approaches
people take went working in large repos with lots of other collaborators.
Or am I just using git wrong in a repo like this, and should adopt another
approach?

I thought about doing something like tracking
`refs/heads*/some-colleague-branch` from the remote, since with the wildcard
`*` I at least won't the fatal error on the missing reference during fetch, but
that risks my config containing an ever growing list of such wildcards, or a
bunch of manual work occasionally cleaning up old ones (or maybe that could be
automated).

Thanks,
Matt

^ permalink raw reply

* Re: git hook question
From: Wesley @ 2026-05-29 16:22 UTC (permalink / raw)
  To: Jeff King; +Cc: Git maillinglist
In-Reply-To: <c5527d8c-9147-4355-a07d-153d3977108e@opperschaap.net>

On 5/29/26 12:11, Wesley Schwengle wrote:

>> Git will paste together the shell command:
>>
>>    npm run test # "$@"
> 
> That doesn't work on my side:
> 
> $ cat ~/.config/git/js.config && git config --get hook.npm-test.command 
> && GIT_TRACE=1 git poh
> [hook "npm-test"]
>    event = pre-push
>    command = npm run test #
>    enabled = true
> npm run test

It does work with:
`command = npm run test "#"'

Small oversight.

Cheers,
Wesley

-- 
Wesley

Why not both?


^ permalink raw reply

* Re: git hook question
From: Wesley Schwengle @ 2026-05-29 16:11 UTC (permalink / raw)
  To: Jeff King; +Cc: Git maillinglist
In-Reply-To: <20260529052141.GA1099450@coredump.intra.peff.net>

On 5/29/26 01:21, Jeff King wrote:
> On Fri, May 29, 2026 at 01:01:34AM -0400, Wesley Schwengle wrote:
> 
>> I understand the why, normally pre-push gets `<local-ref> SP
>> <local-object-name> SP <remote-ref> SP <remote-object-name> LF'. This has a
>> similar feel, albeit a different syntax. The difference feels like a minor
>> bug, but not one I'm worried about at this moment: you would expect it to
>> get the same arguments/parameters as the regular pre-push hook. But I
>> digress.
> 
> I think the "git hook" command is mostly intended for scripting, and the
> caller is expected to understand the context and provide the appropriate
> arguments. The hook command itself doesn't know about what a "pre-push"
> hook should look like.
> 
> So not a bug, but definitely a gotcha that could perhaps be better
> explained in the documentation.

I think the "normal" pre-push makes more sense than the one I'm seeing 
right now, but perhaps that's me. But I think that the docs would 
perhaps need an update to why this `remote url' are the arguments. 
Especially if you read `githooks(5)' it seems a little strange.

>> My actual question is: Is there a way to tell the hook "Don't give me
>> arguments, just run the plain command that is defined". I looked in `man 1
>> git-hook', but I was unable to find something that looks like it.
> 
> I don't think so; the command is expected to handle (or ignore) the
> arguments as appropriate. You could obviously write a wrapper script to
> handle that, but since hook commands are run with a shell you can inline
> it, like:
> 
>    git config hook.npm-test.command 'npm run test #'
> 
> Git will paste together the shell command:
> 
>    npm run test # "$@"

That doesn't work on my side:

$ cat ~/.config/git/js.config && git config --get hook.npm-test.command 
&& GIT_TRACE=1 git poh
[hook "npm-test"]
   event = pre-push
   command = npm run test #
   enabled = true
npm run test

[snip, alias expansion]

11:49:46.746800 run-command.c:673       trace: run_command: git push 
origin HEAD
11:49:46.746811 run-command.c:765       trace: start_command: 
/home/wesleys/.local/libexec/git-core/git push origin HEAD
11:49:46.749640 git.c:502               trace: built-in: git push origin 
HEAD
11:49:46.752107 run-command.c:673       trace: run_command: unset 
GIT_PREFIX; ssh git@gitlab.com 'git-receive-pack '\''some/repo'\'''
11:49:46.752135 run-command.c:765       trace: start_command: 
/usr/bin/ssh git@gitlab.com 'git-receive-pack '\''some/repo'\'''
11:49:47.549946 run-command.c:1576      run_processes_parallel: 
preparing to run up to 1 tasks
11:49:47.549988 run-command.c:673       trace: run_command: 'npm run 
test' origin git@gitlab.com:some/repo
11:49:47.550012 run-command.c:765       trace: start_command: /bin/sh -c 
'npm run test "$@"' 'npm run test' origin git@gitlab.com:some/repo

> @skirbi/semtic@0.0.18 test
> tap origin git@gitlab.com:some/repo

No valid test files found matching "origin" "git@gitlab.com:some/repo"
11:49:48.145805 run-command.c:1604      run_processes_parallel: done
error: failed to push some refs to 'gitlab.com:some/repo'

> The more
> general form of this trick is to use a shell function, like:
> 
>    f() { your_cmd_here; }; f

Also seems to fail:

[hook "npm-test"]
   event = pre-push
   command = git npm-test
   enabled = true

[alias]
   npm-test = !f() { npm run test; }; f


11:53:14.678237 run-command.c:673       trace: run_command: 'f() { npm 
run test' origin git@gitlab.com:some/repo
11:53:14.678248 run-command.c:765       trace: start_command: /bin/sh -c 
'f() { npm run test "$@"' 'f() { npm run test' origin 
git@gitlab.com:some/repo
f() { npm run test: 1: Syntax error: end of file unexpected (expecting "}")

The wrapper script seems the only viable solution. I do think it's a 
little annoying, because any linter, tester, thing that gets called by 
this infra now needs to add wrappers. Which means you either need to 
start making a githook repo for all the tests that you have.

The following circles back a little to the first response.

Tt kind of diverges from `git hook run pre-push' and how additional 
arguments are given on the command line with that invocation. Wrappers 
need to become aware on way it is called, either via hook or via a 
manual way, because of the `remote url' that gets added.

Normal hooks get that info via their STDIN, wouldn't this also make 
sense for these type of hooks? It makes differentiation much easier.

Cheers,
Wesley

-- 
Wesley Schwengle


^ permalink raw reply

* [PATCH] index-pack: retain child bases in delta cache
From: Arijit Banerjee via GitGitGadget @ 2026-05-29 16:06 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Derrick Stolee, Arijit Banerjee, Arijit Banerjee

From: Arijit Banerjee <arijit@effectiveailabs.com>

When resolving a delta whose result has children of its own,
index-pack adds the result to work_head, accounts its data in
base_cache_used, and calls prune_base_data(). It then immediately
frees that same data.

This bypasses the existing delta base cache policy and can force later
descendants to reconstruct the queued base again. Let the existing
delta_base_cache_limit pruning policy decide whether to keep or evict
the data instead.

Signed-off-by: Arijit Banerjee <arijit@effectiveailabs.com>
---
    index-pack: retain child bases in delta cache
    
    Speed up the local pack indexing phase of clone/fetch for large
    delta-compressed packs by keeping reconstructed delta bases available
    for reuse when they are queued for later delta resolution.
    
    When index-pack reconstructs a child base and queues it for resolving
    descendant deltas, it currently frees that data immediately. This can
    force the same base to be reconstructed again. Instead, keep it in the
    existing delta base cache and let the existing delta_base_cache_limit
    policy decide whether to retain or evict it.
    
    This does not add a new cache or increase the cache limit. The object
    data is already accounted in base_cache_used, and prune_base_data() is
    already called at this point.
    
    Correctness:
    
     * t/t5302-pack-index.sh passed all 36 tests.
    
    Benchmarks on a quiet Ubuntu 24.04 VM, 16 vCPU, 32 GiB RAM, local SSD:
    
    pack baseline patched wall-time change RSS change linux blobless 69.17s
    57.98s 16.2% faster -0.0% linux full 280.72s 236.32s 15.8% faster +1.9%
    
    Five-repeat public-repo medians also improved: git.git 13.1%, libgit2
    14.0%, redis 13.5%, cpython 4.8%.
    
    Perf on the linux blobless pack showed the same direction under
    profiling: 76.64s baseline vs 61.09s patched, with similar RSS.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2131%2Farijit91%2Findex-pack-retain-child-base-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2131/arijit91/index-pack-retain-child-base-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/2131

 builtin/index-pack.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index cf0bd8280d..027c64b522 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1212,7 +1212,6 @@ static void *threaded_second_pass(void *data)
 			list_add(&child->list, &work_head);
 			base_cache_used += child->size;
 			prune_base_data(NULL);
-			free_base_data(child);
 		} else if (child) {
 			/*
 			 * This child does not have its own children. It may be

base-commit: c69baaf57ba26cf117c2b6793802877f19738b0d
-- 
gitgitgadget

^ permalink raw reply related

* Re: [PATCH] commit-reach: stop sorting in paint_down_to_common()
From: René Scharfe @ 2026-05-29 15:32 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List
In-Reply-To: <20260529084325.GF1106035@coredump.intra.peff.net>

On 5/29/26 10:43 AM, Jeff King wrote:
> On Wed, May 27, 2026 at 05:52:17PM +0200, René Scharfe wrote:
> 
>> None of the three callers of paint_down_to_common() care about the order
>> of its result list: merge_bases_many() sorts it again after removing
>> stale items, remove_redundant_no_gen() and repo_in_merge_bases_many()
>> throw the list away without even looking at it.  So drop the unnecessary
>> commit_list_sort_by_date() call.
> 
> Seems like an easy win. If some of the callers do not even look at the
> result, could we avoid building it at all in those cases (e.g., by
> passing in a NULL result pointer)?

Yes, at the cost of adding NULL checks to paint_down_to_common().  Which
is probably worth it.
> I guess there is not much to be gained, though. The result is a list of
> merge bases, so it should usually be rather small. The benefit in your
> patch is probably not performance, but just reducing the size of the
> code.
True.  The list can be arbitrarily long, but should only contain a
handful commits in normal repos.

René


^ permalink raw reply

* Re: [BUG] internal date format does not accept small unix timestamps
From: Luna Schwalbe @ 2026-05-29 14:52 UTC (permalink / raw)
  To: Kristoffer Haugsbakk, git
In-Reply-To: <a8e51dda-7b1d-426e-9af9-cf856c42342d@app.fastmail.com>

 > Apparently you need `@` in front for small Unix Epoch values. `@0 +0000`

That is wonderful, thank you so much, I somehow did not find this small 
detail anywhere.

Maybe it could be added to Documentation/date-formats.adoc?

Luna

^ permalink raw reply

* [PATCH v2] config.mak.uname: avoid macOS linker warning on Xcode 16.3+
From: Harald Nordgren via GitGitGadget @ 2026-05-29 14:32 UTC (permalink / raw)
  To: git; +Cc: Harald Nordgren, Harald Nordgren
In-Reply-To: <pull.2313.git.git.1779901919956.gitgitgadget@gmail.com>

From: Harald Nordgren <haraldnordgren@gmail.com>

Building on macOS with Xcode 16.3 or newer emits:

    ld: warning: reducing alignment of section __DATA,__common
    from 0x8000 to 0x4000 because it exceeds segment maximum
    alignment

Pass -fno-common when "ld -v" reports ld-1167 or newer, so tentative
definitions of large arrays go into BSS instead of __DATA,__common.

Signed-off-by: Harald Nordgren <haraldnordgren@gmail.com>
---
    pkt-line: initialize packet_buffer to avoid macOS linker warning
    
     * Check MacOS ld version instead
       (https://en.wikipedia.org/wiki/Xcode#Xcode_15.0_-_16.x_(since_visionOS_support)_2)
    
    Parsing output of
    
    ❯ ld -v
    @(#)PROGRAM:ld PROJECT:ld-1267
    BUILD 18:30:29 Apr 22 2026
    configured to support archs: armv6 armv7 armv7s arm64 arm64e arm64_32 i386 x86_64 x86_64h armv6m armv7k armv7m armv7em armv8m.main armv8.1m.main
    will use ld-classic for: armv6 armv7 armv7s i386 armv6m armv7k armv7m armv7em
    LTO support using: LLVM version 21.0.0 (static support for 30, runtime is 30)
    TAPI support using: Apple TAPI version 21.0.0 (tapi-2100.0.2.6)
    

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-2313%2FHaraldNordgren%2Fpkt-line-init-buffer-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-2313/HaraldNordgren/pkt-line-init-buffer-v2
Pull-Request: https://github.com/git/git/pull/2313

Range-diff vs v1:

 1:  1c1c66d85b < -:  ---------- pkt-line: initialize packet_buffer to avoid macOS linker warning
 -:  ---------- > 1:  0e660a346e config.mak.uname: avoid macOS linker warning on Xcode 16.3+


 config.mak.uname | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/config.mak.uname b/config.mak.uname
index ce5e7de779..d4d55cb324 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -163,6 +163,12 @@ ifeq ($(uname_S),Darwin)
 		NEEDS_GOOD_LIBICONV = UnfortunatelyYes
         endif
 
+	# Silence Xcode 16.3+ linker warning about __DATA,__common alignment.
+	LD_MAJOR_VERSION = $(shell ld -v 2>&1 | sed -n 's/.*PROJECT:ld-\([0-9]*\).*/\1/p')
+        ifeq ($(shell test "$(LD_MAJOR_VERSION)" -ge 1167 && echo 1),1)
+		BASIC_CFLAGS += -fno-common
+        endif
+
 	# The builtin FSMonitor on MacOS builds upon Simple-IPC.  Both require
 	# Unix domain sockets and PThreads.
         ifndef NO_PTHREADS

base-commit: c69baaf57ba26cf117c2b6793802877f19738b0d
-- 
gitgitgadget

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox