* [PATCH v2] diff-index: don't queue unchanged filepairs with diff_change()
@ 2025-11-30 11:47 René Scharfe
2025-11-30 18:02 ` Junio C Hamano
0 siblings, 1 reply; 5+ messages in thread
From: René Scharfe @ 2025-11-30 11:47 UTC (permalink / raw)
To: Git List; +Cc: D. Ben Knoble, Jeff King, Phillip Wood, Junio C Hamano
diff_cache() queues unchanged filepairs if the flag find_copies_harder
is set, and uses diff_change() for that. This function allocates a
filespec for each side, does a few other things that are unnecessary for
unchanged filepairs and always sets the diff_flag has_changes, which is
simply misleading in this case.
Add a new streamlined function for queuing unchanged filepairs and
use it in show_modified(), which is called by diff_cache() via
oneway_diff() and do_oneway_diff(). It allocates only a single filespec
for each filepair and uses it twice with reference counting. This has a
measurable effect if there are a lot of them, like in the Linux repo:
Benchmark 1: ./git_v2.52.0 -C ../linux diff --cached --find-copies-harder
Time (mean ± σ): 31.8 ms ± 0.2 ms [User: 24.2 ms, System: 6.3 ms]
Range (min … max): 31.5 ms … 32.3 ms 85 runs
Benchmark 2: ./git -C ../linux diff --cached --find-copies-harder
Time (mean ± σ): 23.9 ms ± 0.2 ms [User: 18.1 ms, System: 4.6 ms]
Range (min … max): 23.5 ms … 24.4 ms 111 runs
Summary
./git -C ../linux diff --cached --find-copies-harder ran
1.33 ± 0.01 times faster than ./git_v2.52.0 -C ../linux diff --cached --find-copies-harder
Signed-off-by: René Scharfe <l.s.r@web.de>
---
Changes since v1:
- Clearer description of memory usage in the commit message.
diff-lib.c | 13 ++++++-------
diff.c | 20 ++++++++++++++++++++
diff.h | 5 +++++
3 files changed, 31 insertions(+), 7 deletions(-)
diff --git a/diff-lib.c b/diff-lib.c
index b8f8f3bc31..8e624f38c6 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -418,13 +418,12 @@ static int show_modified(struct rev_info *revs,
}
oldmode = old_entry->ce_mode;
- if (mode == oldmode && oideq(oid, &old_entry->oid) && !dirty_submodule &&
- !revs->diffopt.flags.find_copies_harder)
- return 0;
-
- diff_change(&revs->diffopt, oldmode, mode,
- &old_entry->oid, oid, 1, !is_null_oid(oid),
- old_entry->name, 0, dirty_submodule);
+ if (mode != oldmode || !oideq(oid, &old_entry->oid) || dirty_submodule)
+ diff_change(&revs->diffopt, oldmode, mode,
+ &old_entry->oid, oid, 1, !is_null_oid(oid),
+ old_entry->name, 0, dirty_submodule);
+ else if (revs->diffopt.flags.find_copies_harder)
+ diff_same(&revs->diffopt, mode, oid, old_entry->name);
return 0;
}
diff --git a/diff.c b/diff.c
index 915317025f..63d33251cd 100644
--- a/diff.c
+++ b/diff.c
@@ -7348,6 +7348,26 @@ void diff_change(struct diff_options *options,
concatpath, old_dirty_submodule, new_dirty_submodule);
}
+void diff_same(struct diff_options *options,
+ unsigned mode,
+ const struct object_id *oid,
+ const char *concatpath)
+{
+ struct diff_filespec *one;
+
+ if (S_ISGITLINK(mode) && is_submodule_ignored(concatpath, options))
+ return;
+
+ if (options->prefix &&
+ strncmp(concatpath, options->prefix, options->prefix_length))
+ return;
+
+ one = alloc_filespec(concatpath);
+ fill_filespec(one, oid, 1, mode);
+ one->count++;
+ diff_queue(&diff_queued_diff, one, one);
+}
+
struct diff_filepair *diff_unmerge(struct diff_options *options, const char *path)
{
struct diff_filepair *pair;
diff --git a/diff.h b/diff.h
index 31eedd5c0c..e80503aebb 100644
--- a/diff.h
+++ b/diff.h
@@ -572,6 +572,11 @@ void diff_change(struct diff_options *,
const char *fullpath,
unsigned dirty_submodule1, unsigned dirty_submodule2);
+void diff_same(struct diff_options *,
+ unsigned mode,
+ const struct object_id *oid,
+ const char *fullpath);
+
struct diff_filepair *diff_unmerge(struct diff_options *, const char *path);
void compute_diffstat(struct diff_options *options, struct diffstat_t *diffstat,
--
2.52.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH v2] diff-index: don't queue unchanged filepairs with diff_change() 2025-11-30 11:47 [PATCH v2] diff-index: don't queue unchanged filepairs with diff_change() René Scharfe @ 2025-11-30 18:02 ` Junio C Hamano 2025-12-02 21:16 ` René Scharfe 2025-12-02 22:07 ` René Scharfe 0 siblings, 2 replies; 5+ messages in thread From: Junio C Hamano @ 2025-11-30 18:02 UTC (permalink / raw) To: René Scharfe; +Cc: Git List, D. Ben Knoble, Jeff King, Phillip Wood René Scharfe <l.s.r@web.de> writes: > Add a new streamlined function for queuing unchanged filepairs and > use it in show_modified(), which is called by diff_cache() via > oneway_diff() and do_oneway_diff(). It allocates only a single filespec > for each filepair and uses it twice with reference counting. This has a > measurable effect if there are a lot of them, like in the Linux repo: > > Benchmark 1: ./git_v2.52.0 -C ../linux diff --cached --find-copies-harder > Time (mean ± σ): 31.8 ms ± 0.2 ms [User: 24.2 ms, System: 6.3 ms] > Range (min … max): 31.5 ms … 32.3 ms 85 runs > > Benchmark 2: ./git -C ../linux diff --cached --find-copies-harder > Time (mean ± σ): 23.9 ms ± 0.2 ms [User: 18.1 ms, System: 4.6 ms] > Range (min … max): 23.5 ms … 24.4 ms 111 runs > > Summary > ./git -C ../linux diff --cached --find-copies-harder ran > 1.33 ± 0.01 times faster than ./git_v2.52.0 -C ../linux diff --cached --find-copies-harder Nice. Is this technique only applicable to diff-index among the three diff plumbing siblings? I suspect diff-files is an oddball in that on the working tree side we do not necessarily have the blob object names, but it would apply to diff-tree, wouldn't it? Will queue. Thanks. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] diff-index: don't queue unchanged filepairs with diff_change() 2025-11-30 18:02 ` Junio C Hamano @ 2025-12-02 21:16 ` René Scharfe 2025-12-02 22:07 ` René Scharfe 1 sibling, 0 replies; 5+ messages in thread From: René Scharfe @ 2025-12-02 21:16 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git List, D. Ben Knoble, Jeff King, Phillip Wood On 11/30/25 7:02 PM, Junio C Hamano wrote: > René Scharfe <l.s.r@web.de> writes: > >> Add a new streamlined function for queuing unchanged filepairs and >> use it in show_modified(), which is called by diff_cache() via >> oneway_diff() and do_oneway_diff(). It allocates only a single filespec >> for each filepair and uses it twice with reference counting. This has a >> measurable effect if there are a lot of them, like in the Linux repo: >> >> Benchmark 1: ./git_v2.52.0 -C ../linux diff --cached --find-copies-harder >> Time (mean ± σ): 31.8 ms ± 0.2 ms [User: 24.2 ms, System: 6.3 ms] >> Range (min … max): 31.5 ms … 32.3 ms 85 runs >> >> Benchmark 2: ./git -C ../linux diff --cached --find-copies-harder >> Time (mean ± σ): 23.9 ms ± 0.2 ms [User: 18.1 ms, System: 4.6 ms] >> Range (min … max): 23.5 ms … 24.4 ms 111 runs >> >> Summary >> ./git -C ../linux diff --cached --find-copies-harder ran >> 1.33 ± 0.01 times faster than ./git_v2.52.0 -C ../linux diff --cached --find-copies-harder > > Nice. Is this technique only applicable to diff-index among the > three diff plumbing siblings? I suspect diff-files is an oddball > in that on the working tree side we do not necessarily have the > blob object names Indeed: - git diff-files compares index and working tree, - a copy is a new file with contents from an old file, - git ignores new files in the working tree. So in theory git diff-files can only detect copies in the other direction. Or is there a way I'm missing? In practice, however, it doesn't do that reliably because it simply skips up-to-date index entries. Oops. --- >8 --- Subject: [PATCH v2 2/1] diff-files: fix copy detection Copy detection cannot work when comparing the index to the working tree because Git ignores files that it is not explicitly told to track. It should work in the other direction, though, i.e. for a reverse diff of the deletion of a copy from the index. d1f2d7e8ca (Make run_diff_index() use unpack_trees(), not read_tree(), 2008-01-19) broke it with a seemingly stray change to run_diff_files(). We didn't notice because there's no test for that. But even if we had one, it might have gone unnoticed because the breakage only happens with index preloading, which requires at least 1000 entries (more than most test repos have) and is racy because it runs in parallel with the actual command. Fix copy detection by queuing up-to-date and skip-worktree entries using diff_same(). While at it, use diff_same() also for queuing unchanged files not flagged as up-to-date, i.e. clean submodules and entries where preloading was not done at all or not quickly enough. It uses less memory than diff_change() and doesn't unnecessarily set the diff flag has_changes. Add two tests to cover running both without and with preloading. The first one passes reliably with the original code. The second one enables preloading and thus is racy. It has a good chance to pass even without the fix, but fails within seconds when running the test script with --stress. With the fix it runs fine for several minutes, until my patience runs out. Signed-off-by: René Scharfe <l.s.r@web.de> --- Patch formatted with -U9 for easier review of the second hunk. diff-lib.c | 12 +++++++++--- t/t4007-rename-3.sh | 23 ++++++++++++++++++++++- 2 files changed, 31 insertions(+), 4 deletions(-) diff --git a/diff-lib.c b/diff-lib.c index 8e624f38c6..5307390ff3 100644 --- a/diff-lib.c +++ b/diff-lib.c @@ -220,20 +220,24 @@ void run_diff_files(struct rev_info *revs, unsigned int option) * from the desired stage. */ pair = diff_unmerge(&revs->diffopt, ce->name); if (wt_mode) pair->two->mode = wt_mode; if (ce_stage(ce) != diff_unmerged_stage) continue; } - if (ce_uptodate(ce) || ce_skip_worktree(ce)) + if (ce_uptodate(ce) || ce_skip_worktree(ce)) { + if (revs->diffopt.flags.find_copies_harder) + diff_same(&revs->diffopt, ce->ce_mode, + &ce->oid, ce->name); continue; + } /* * When CE_VALID is set (via "update-index --assume-unchanged" * or via adding paths while core.ignorestat is set to true), * the user has promised that the working tree file for that * path will not be modified. When CE_FSMONITOR_VALID is true, * the fsmonitor knows that the path hasn't been modified since * we refreshed the cached stat information. In either case, * we do not have to stat to see if the path has been removed @@ -266,20 +270,22 @@ void run_diff_files(struct rev_info *revs, unsigned int option) changed = match_stat_with_submodule(&revs->diffopt, ce, &st, ce_option, &dirty_submodule); newmode = ce_mode_from_stat(ce, st.st_mode); } if (!changed && !dirty_submodule) { ce_mark_uptodate(ce); mark_fsmonitor_valid(istate, ce); - if (!revs->diffopt.flags.find_copies_harder) - continue; + if (revs->diffopt.flags.find_copies_harder) + diff_same(&revs->diffopt, newmode, + &ce->oid, ce->name); + continue; } oldmode = ce->ce_mode; old_oid = &ce->oid; new_oid = changed ? null_oid(the_hash_algo) : &ce->oid; diff_change(&revs->diffopt, oldmode, newmode, old_oid, new_oid, !is_null_oid(old_oid), !is_null_oid(new_oid), ce->name, 0, dirty_submodule); diff --git a/t/t4007-rename-3.sh b/t/t4007-rename-3.sh index 3fc81bcd76..1012a370dd 100755 --- a/t/t4007-rename-3.sh +++ b/t/t4007-rename-3.sh @@ -61,19 +61,40 @@ cat >expected <<EOF :000000 100644 $ZERO_OID $blob A path1/COPYING EOF test_expect_success 'copy, limited to a subtree' ' git diff-index -C --find-copies-harder $tree path1 >current && compare_diff_raw current expected ' test_expect_success 'tweak work tree' ' - rm -f path0/COPYING && + rm -f path0/COPYING +' + +cat >expected <<EOF +:100644 100644 $blob $blob C100 path1/COPYING path0/COPYING +EOF + +# The cache has path0/COPYING and path1/COPYING, the working tree only +# path1/COPYING. This is a deletion -- we don't treat deduplication +# specially. In reverse it should be detected as a copy, though. +test_expect_success 'copy detection, files to index' ' + git diff-files -C --find-copies-harder -R >current && + compare_diff_raw current expected +' + +test_expect_success 'copy detection, files to preloaded index' ' + GIT_TEST_PRELOAD_INDEX=1 \ + git diff-files -C --find-copies-harder -R >current && + compare_diff_raw current expected +' + +test_expect_success 'tweak index' ' git update-index --remove path0/COPYING ' # In the tree, there is only path0/COPYING. In the cache, path0 does # not have COPYING anymore and path1 has COPYING which is a copy of # path0/COPYING. Showing the full tree with cache should tell us about # the rename. cat >expected <<EOF :100644 100644 $blob $blob R100 path0/COPYING path1/COPYING -- 2.52.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2] diff-index: don't queue unchanged filepairs with diff_change() 2025-11-30 18:02 ` Junio C Hamano 2025-12-02 21:16 ` René Scharfe @ 2025-12-02 22:07 ` René Scharfe 2025-12-03 15:06 ` René Scharfe 1 sibling, 1 reply; 5+ messages in thread From: René Scharfe @ 2025-12-02 22:07 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git List, D. Ben Knoble, Jeff King, Phillip Wood On 11/30/25 7:02 PM, Junio C Hamano wrote: > René Scharfe <l.s.r@web.de> writes: > >> Add a new streamlined function for queuing unchanged filepairs and >> use it in show_modified(), which is called by diff_cache() via >> oneway_diff() and do_oneway_diff(). It allocates only a single filespec >> for each filepair and uses it twice with reference counting. This has a >> measurable effect if there are a lot of them, like in the Linux repo: >> >> Benchmark 1: ./git_v2.52.0 -C ../linux diff --cached --find-copies-harder >> Time (mean ± σ): 31.8 ms ± 0.2 ms [User: 24.2 ms, System: 6.3 ms] >> Range (min … max): 31.5 ms … 32.3 ms 85 runs >> >> Benchmark 2: ./git -C ../linux diff --cached --find-copies-harder >> Time (mean ± σ): 23.9 ms ± 0.2 ms [User: 18.1 ms, System: 4.6 ms] >> Range (min … max): 23.5 ms … 24.4 ms 111 runs >> >> Summary >> ./git -C ../linux diff --cached --find-copies-harder ran >> 1.33 ± 0.01 times faster than ./git_v2.52.0 -C ../linux diff --cached --find-copies-harder > > Nice. Is this technique only applicable to diff-index among the > three diff plumbing siblings? > [...] it would apply to diff-tree, wouldn't it? Yes, but its diff_change() call is behind two layers of callbacks, which complicates things. And I don't know how to avoid adding an object ID comparison. Do we perhaps have that bit somewhere in tree-diff.c already and can pass it along the pathchange call? And I wonder now if diff_same() is the right name. Shouldn't it be diff_keep() or a similar verb to match the siblings diff_change(), diff_addremove() and diff_unmerge()? Performance looks mixed. E.g. memory usage is reduced slightly here: $ for git in ./git_v2.52.0 ./git do for i in $(seq 5) do /usr/bin/time -l $git diff-tree --find-copies-harder -r v2.51.0 v2.51.1 2>&1 >/dev/null done done | grep peak 36111744 peak memory footprint 36291968 peak memory footprint 36177280 peak memory footprint 36177280 peak memory footprint 36291968 peak memory footprint 35456384 peak memory footprint 35489152 peak memory footprint 35505536 peak memory footprint 35636608 peak memory footprint 35505536 peak memory footprint But diff-tree needs 1% more time with the patch: Benchmark 1: ./git_v2.52.0 diff-tree --find-copies-harder -r v2.51.0 v2.51.1 Time (mean ± σ): 78.3 ms ± 0.2 ms [User: 57.4 ms, System: 19.8 ms] Range (min … max): 77.9 ms … 78.7 ms 36 runs Benchmark 2: ./git diff-tree --find-copies-harder -r v2.51.0 v2.51.1 Time (mean ± σ): 78.8 ms ± 0.2 ms [User: 57.9 ms, System: 19.8 ms] Range (min … max): 78.4 ms … 79.2 ms 36 runs Summary ./git_v2.52.0 diff-tree --find-copies-harder -r v2.51.0 v2.51.1 ran 1.01 ± 0.00 times faster than ./git diff-tree --find-copies-harder -r v2.51.0 v2.51.1 Other examples look better: Benchmark 1: ./git_v2.52.0 -C ../linux diff-tree --find-copies-harder -r v6.8 v6.9 Time (mean ± σ): 110.7 ms ± 0.2 ms [User: 83.1 ms, System: 26.8 ms] Range (min … max): 110.2 ms … 111.4 ms 26 runs Benchmark 2: ./git -C ../linux diff-tree --find-copies-harder -r v6.8 v6.9 Time (mean ± σ): 105.3 ms ± 0.4 ms [User: 78.9 ms, System: 24.0 ms] Range (min … max): 104.7 ms … 106.0 ms 27 runs Summary ./git -C ../linux diff-tree --find-copies-harder -r v6.8 v6.9 ran 1.05 ± 0.00 times faster than ./git_v2.52.0 -C ../linux diff-tree --find-copies-harder -r v6.8 v6.9 But overall I'm not impressed. :-| René --- builtin/reset.c | 1 + diff.c | 1 + diff.h | 7 ++++++- diffcore.h | 4 ++-- tree-diff.c | 8 ++++++-- 5 files changed, 16 insertions(+), 5 deletions(-) diff --git a/builtin/reset.c b/builtin/reset.c index ed35802af1..ec674694dd 100644 --- a/builtin/reset.c +++ b/builtin/reset.c @@ -210,6 +210,7 @@ static int read_from_tree(const struct pathspec *pathspec, opt.repo = the_repository; opt.change = diff_change; opt.add_remove = diff_addremove; + opt.keep = diff_same; if (pathspec->nr && pathspec_needs_expanded_index(the_repository->index, pathspec)) ensure_full_index(the_repository->index); diff --git a/diff.c b/diff.c index 436da250eb..9671524d2b 100644 --- a/diff.c +++ b/diff.c @@ -4847,6 +4847,7 @@ void repo_diff_setup(struct repository *r, struct diff_options *options) /* pathchange left =NULL by default */ options->change = diff_change; options->add_remove = diff_addremove; + options->keep = diff_same; options->use_color = diff_use_color_default; options->detect_rename = diff_detect_rename_default; options->xdl_opts |= diff_algorithm; diff --git a/diff.h b/diff.h index 7eb84aadf4..6dfec55039 100644 --- a/diff.h +++ b/diff.h @@ -43,7 +43,8 @@ struct oidset; * set_default in diff_options can be used to tweak this more. * * - As you find different pairs of files, call `diff_change()` to feed - * modified files, `diff_addremove()` to feed created or deleted files, or + * modified files, `diff_addremove()` to feed created or deleted files, + * `diff_same()` to feed unmodified files if needed for copy detection, or * `diff_unmerge()` to feed a file whose state is 'unmerged' to the API. * These are thin wrappers to a lower-level `diff_queue()` function that is * flexible enough to record any of these kinds of changes. @@ -92,6 +93,9 @@ typedef void (*add_remove_fn_t)(struct diff_options *options, int oid_valid, const char *fullpath, unsigned dirty_submodule); +typedef void (*keep_fn_t)(struct diff_options *options, unsigned mode, + const struct object_id *oid, const char *fullpath); + typedef void (*diff_format_fn_t)(struct diff_queue_struct *q, struct diff_options *options, void *data); @@ -384,6 +388,7 @@ struct diff_options { pathchange_fn_t pathchange; change_fn_t change; add_remove_fn_t add_remove; + keep_fn_t keep; void *change_fn_data; diff_format_fn_t format_callback; void *format_callback_data; diff --git a/diffcore.h b/diffcore.h index 9c0a0e7aaf..64b419b33f 100644 --- a/diffcore.h +++ b/diffcore.h @@ -35,8 +35,8 @@ struct userdiff_driver; /** * the internal representation for a single file (blob). It records the blob * object name (if known -- for a work tree file it typically is a NUL SHA-1), - * filemode and pathname. This is what the `diff_addremove()`, `diff_change()` - * and `diff_unmerge()` synthesize and feed `diff_queue()` function with. + * filemode and pathname. This is what `diff_addremove()`, `diff_change()`, + * `diff_same()` and `diff_unmerge()` synthesize and feed `diff_queue()`. */ struct diff_filespec { struct object_id oid; diff --git a/tree-diff.c b/tree-diff.c index 5988148b60..c5e2cf6a69 100644 --- a/tree-diff.c +++ b/tree-diff.c @@ -167,8 +167,12 @@ static int emit_diff_first_parent_only(struct diff_options *opt, struct combine_ { struct combine_diff_parent *p0 = &p->parent[0]; if (p->mode && p0->mode) { - opt->change(opt, p0->mode, p->mode, &p0->oid, &p->oid, - 1, 1, p->path, 0, 0); + if (opt->flags.find_copies_harder && opt->keep && + p0->mode == p->mode && oideq(&p0->oid, &p->oid)) + opt->keep(opt, p->mode, &p->oid, p->path); + else + opt->change(opt, p0->mode, p->mode, &p0->oid, &p->oid, + 1, 1, p->path, 0, 0); } else { const struct object_id *oid; -- 2.52.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2] diff-index: don't queue unchanged filepairs with diff_change() 2025-12-02 22:07 ` René Scharfe @ 2025-12-03 15:06 ` René Scharfe 0 siblings, 0 replies; 5+ messages in thread From: René Scharfe @ 2025-12-03 15:06 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git List, D. Ben Knoble, Jeff King, Phillip Wood On 12/2/25 11:07 PM, René Scharfe wrote: > On 11/30/25 7:02 PM, Junio C Hamano wrote: >> René Scharfe <l.s.r@web.de> writes: >> >>> Add a new streamlined function for queuing unchanged filepairs and >>> use it in show_modified(), which is called by diff_cache() via >>> oneway_diff() and do_oneway_diff(). It allocates only a single filespec >>> for each filepair and uses it twice with reference counting. This has a >>> measurable effect if there are a lot of them, like in the Linux repo: >>> >>> Benchmark 1: ./git_v2.52.0 -C ../linux diff --cached --find-copies-harder >>> Time (mean ± σ): 31.8 ms ± 0.2 ms [User: 24.2 ms, System: 6.3 ms] >>> Range (min … max): 31.5 ms … 32.3 ms 85 runs >>> >>> Benchmark 2: ./git -C ../linux diff --cached --find-copies-harder >>> Time (mean ± σ): 23.9 ms ± 0.2 ms [User: 18.1 ms, System: 4.6 ms] >>> Range (min … max): 23.5 ms … 24.4 ms 111 runs >>> >>> Summary >>> ./git -C ../linux diff --cached --find-copies-harder ran >>> 1.33 ± 0.01 times faster than ./git_v2.52.0 -C ../linux diff --cached --find-copies-harder >> >> Nice. Is this technique only applicable to diff-index among the >> three diff plumbing siblings? > >> [...] it would apply to diff-tree, wouldn't it? > Yes, but its diff_change() call is behind two layers of callbacks, which > complicates things. > > And I don't know how to avoid adding an object ID comparison. Do we > perhaps have that bit somewhere in tree-diff.c already and can pass it > along the pathchange call? Yes, new patch below. Not sure if it's better, though. > Benchmark 1: ./git_v2.52.0 diff-tree --find-copies-harder -r v2.51.0 v2.51.1 > Time (mean ± σ): 78.3 ms ± 0.2 ms [User: 57.4 ms, System: 19.8 ms] > Range (min … max): 77.9 ms … 78.7 ms 36 runs > > Benchmark 2: ./git diff-tree --find-copies-harder -r v2.51.0 v2.51.1 > Time (mean ± σ): 78.8 ms ± 0.2 ms [User: 57.9 ms, System: 19.8 ms] > Range (min … max): 78.4 ms … 79.2 ms 36 runs > > Summary > ./git_v2.52.0 diff-tree --find-copies-harder -r v2.51.0 v2.51.1 ran > 1.01 ± 0.00 times faster than ./git diff-tree --find-copies-harder -r v2.51.0 v2.51.1 Benchmark 1: ./git_v2.52.0 diff-tree --find-copies-harder -r v2.51.0 v2.51.1 Time (mean ± σ): 75.6 ms ± 0.6 ms [User: 57.4 ms, System: 17.2 ms] Range (min … max): 75.0 ms … 78.0 ms 37 runs Benchmark 2: ./git diff-tree --find-copies-harder -r v2.51.0 v2.51.1 Time (mean ± σ): 76.0 ms ± 0.2 ms [User: 57.9 ms, System: 17.1 ms] Range (min … max): 75.6 ms … 76.4 ms 37 runs Summary ./git_v2.52.0 diff-tree --find-copies-harder -r v2.51.0 v2.51.1 ran 1.00 ± 0.01 times faster than ./git diff-tree --find-copies-harder -r v2.51.0 v2.51.1 Hmm, I probably ran some background task when I measured yesterday and got worse results for the git_v2.52.0 baseline than today. René --- builtin/reset.c | 1 + diff.c | 1 + diff.h | 9 +++++++-- diffcore.h | 4 ++-- tree-diff.c | 48 +++++++++++++++++++++++++++--------------------- 5 files changed, 38 insertions(+), 25 deletions(-) diff --git a/builtin/reset.c b/builtin/reset.c index ed35802af1..ec674694dd 100644 --- a/builtin/reset.c +++ b/builtin/reset.c @@ -210,6 +210,7 @@ static int read_from_tree(const struct pathspec *pathspec, opt.repo = the_repository; opt.change = diff_change; opt.add_remove = diff_addremove; + opt.keep = diff_same; if (pathspec->nr && pathspec_needs_expanded_index(the_repository->index, pathspec)) ensure_full_index(the_repository->index); diff --git a/diff.c b/diff.c index 436da250eb..9671524d2b 100644 --- a/diff.c +++ b/diff.c @@ -4847,6 +4847,7 @@ void repo_diff_setup(struct repository *r, struct diff_options *options) /* pathchange left =NULL by default */ options->change = diff_change; options->add_remove = diff_addremove; + options->keep = diff_same; options->use_color = diff_use_color_default; options->detect_rename = diff_detect_rename_default; options->xdl_opts |= diff_algorithm; diff --git a/diff.h b/diff.h index 7eb84aadf4..2e3a5ac04a 100644 --- a/diff.h +++ b/diff.h @@ -43,7 +43,8 @@ struct oidset; * set_default in diff_options can be used to tweak this more. * * - As you find different pairs of files, call `diff_change()` to feed - * modified files, `diff_addremove()` to feed created or deleted files, or + * modified files, `diff_addremove()` to feed created or deleted files, + * `diff_same()` to feed unmodified files if needed for copy detection, or * `diff_unmerge()` to feed a file whose state is 'unmerged' to the API. * These are thin wrappers to a lower-level `diff_queue()` function that is * flexible enough to record any of these kinds of changes. @@ -76,7 +77,7 @@ struct rev_info; struct userdiff_driver; typedef int (*pathchange_fn_t)(struct diff_options *options, - struct combine_diff_path *path); + struct combine_diff_path *path, bool is_change); typedef void (*change_fn_t)(struct diff_options *options, unsigned old_mode, unsigned new_mode, @@ -92,6 +93,9 @@ typedef void (*add_remove_fn_t)(struct diff_options *options, int oid_valid, const char *fullpath, unsigned dirty_submodule); +typedef void (*keep_fn_t)(struct diff_options *options, unsigned mode, + const struct object_id *oid, const char *fullpath); + typedef void (*diff_format_fn_t)(struct diff_queue_struct *q, struct diff_options *options, void *data); @@ -384,6 +388,7 @@ struct diff_options { pathchange_fn_t pathchange; change_fn_t change; add_remove_fn_t add_remove; + keep_fn_t keep; void *change_fn_data; diff_format_fn_t format_callback; void *format_callback_data; diff --git a/diffcore.h b/diffcore.h index 9c0a0e7aaf..64b419b33f 100644 --- a/diffcore.h +++ b/diffcore.h @@ -35,8 +35,8 @@ struct userdiff_driver; /** * the internal representation for a single file (blob). It records the blob * object name (if known -- for a work tree file it typically is a NUL SHA-1), - * filemode and pathname. This is what the `diff_addremove()`, `diff_change()` - * and `diff_unmerge()` synthesize and feed `diff_queue()` function with. + * filemode and pathname. This is what `diff_addremove()`, `diff_change()`, + * `diff_same()` and `diff_unmerge()` synthesize and feed `diff_queue()`. */ struct diff_filespec { struct object_id oid; diff --git a/tree-diff.c b/tree-diff.c index 5988148b60..e711456766 100644 --- a/tree-diff.c +++ b/tree-diff.c @@ -163,12 +163,17 @@ static int tree_entry_pathcmp(struct tree_desc *t1, struct tree_desc *t2) * emits diff to first parent only, and tells diff tree-walker that we are done * with p and it can be freed. */ -static int emit_diff_first_parent_only(struct diff_options *opt, struct combine_diff_path *p) +static int emit_diff_first_parent_only(struct diff_options *opt, + struct combine_diff_path *p, + bool is_change) { struct combine_diff_parent *p0 = &p->parent[0]; if (p->mode && p0->mode) { - opt->change(opt, p0->mode, p->mode, &p0->oid, &p->oid, - 1, 1, p->path, 0, 0); + if (is_change) + opt->change(opt, p0->mode, p->mode, &p0->oid, &p->oid, + 1, 1, p->path, 0, 0); + else if (opt->keep) + opt->keep(opt, p->mode, &p->oid, p->path); } else { const struct object_id *oid; @@ -205,7 +210,7 @@ static int emit_diff_first_parent_only(struct diff_options *opt, struct combine_ static void emit_path(struct combine_diff_path ***tail, struct strbuf *base, struct diff_options *opt, int nparent, struct tree_desc *t, struct tree_desc *tp, - int imin, int depth) + int imin, int depth, bool is_change) { unsigned short mode; const char *path; @@ -288,7 +293,7 @@ static void emit_path(struct combine_diff_path ***tail, keep = 1; if (opt->pathchange) - keep = opt->pathchange(opt, p); + keep = opt->pathchange(opt, p, is_change); if (keep) { **tail = p; @@ -518,26 +523,27 @@ static void ll_diff_tree_paths( /* t = p[imin] */ if (cmp == 0) { /* are either pi > p[imin] or diff(t,pi) != ø ? */ - if (!opt->flags.find_copies_harder) { - for (i = 0; i < nparent; ++i) { - /* p[i] > p[imin] */ - if (tp[i].entry.mode & S_IFXMIN_NEQ) - continue; + bool is_change = true; - /* diff(t,pi) != ø */ - if (!oideq(&t.entry.oid, &tp[i].entry.oid) || - (t.entry.mode != tp[i].entry.mode)) - continue; + for (i = 0; i < nparent; ++i) { + /* p[i] > p[imin] */ + if (tp[i].entry.mode & S_IFXMIN_NEQ) + continue; - goto skip_emit_t_tp; - } + /* diff(t,pi) != ø */ + if (!oideq(&t.entry.oid, &tp[i].entry.oid) || + (t.entry.mode != tp[i].entry.mode)) + continue; + + is_change = false; + break; } /* D += {δ(t,pi) if pi=p[imin]; "+a" if pi > p[imin]} */ - emit_path(tail, base, opt, nparent, - &t, tp, imin, depth); + if (is_change || opt->flags.find_copies_harder) + emit_path(tail, base, opt, nparent, + &t, tp, imin, depth, is_change); - skip_emit_t_tp: /* t↓, ∀ pi=p[imin] pi↓ */ update_tree_entry(&t); update_tp_entries(tp, nparent); @@ -547,7 +553,7 @@ static void ll_diff_tree_paths( else if (cmp < 0) { /* D += "+t" */ emit_path(tail, base, opt, nparent, - &t, /*tp=*/NULL, -1, depth); + &t, /*tp=*/NULL, -1, depth, true); /* t↓ */ update_tree_entry(&t); @@ -563,7 +569,7 @@ static void ll_diff_tree_paths( } emit_path(tail, base, opt, nparent, - /*t=*/NULL, tp, imin, depth); + /*t=*/NULL, tp, imin, depth, true); skip_emit_tp: /* ∀ pi=p[imin] pi↓ */ -- 2.52.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-12-03 15:06 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-11-30 11:47 [PATCH v2] diff-index: don't queue unchanged filepairs with diff_change() René Scharfe 2025-11-30 18:02 ` Junio C Hamano 2025-12-02 21:16 ` René Scharfe 2025-12-02 22:07 ` René Scharfe 2025-12-03 15:06 ` René Scharfe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).