* [PATCH RESEND] diff-files: fix copy detection
@ 2025-12-14 15:57 René Scharfe
2025-12-15 23:02 ` SZEDER Gábor
0 siblings, 1 reply; 3+ messages in thread
From: René Scharfe @ 2025-12-14 15:57 UTC (permalink / raw)
To: Git List
Copy detection cannot work when comparing the index to the working tree
because Git ignores files that it is not explicitly told to track. It
should work in the other direction, though, i.e. for a reverse diff of
the deletion of a copy from the index.
d1f2d7e8ca (Make run_diff_index() use unpack_trees(), not read_tree(),
2008-01-19) broke it with a seemingly stray change to run_diff_files().
We didn't notice because there's no test for that. But even if we had
one, it might have gone unnoticed because the breakage only happens
with index preloading, which requires at least 1000 entries (more than
most test repos have) and is racy because it runs in parallel with the
actual command.
Fix copy detection by queuing up-to-date and skip-worktree entries using
diff_same().
While at it, use diff_same() also for queuing unchanged files not
flagged as up-to-date, i.e. clean submodules and entries where
preloading was not done at all or not quickly enough. It uses less
memory than diff_change() and doesn't unnecessarily set the diff flag
has_changes.
Add two tests to cover running both without and with preloading. The
first one passes reliably with the original code. The second one
enables preloading and thus is racy. It has a good chance to pass even
without the fix, but fails within seconds when running the test script
with --stress. With the fix it runs fine for several minutes, until
my patience runs out.
Signed-off-by: René Scharfe <l.s.r@web.de>
---
Original submission:
https://lore.kernel.org/git/f2e187bb-c765-4cc3-a0a0-1fbaec9a14e2@web.de/
diff-lib.c | 12 +++++++++---
t/t4007-rename-3.sh | 23 ++++++++++++++++++++++-
2 files changed, 31 insertions(+), 4 deletions(-)
diff --git a/diff-lib.c b/diff-lib.c
index 8e624f38c6..5307390ff3 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -226,8 +226,12 @@ void run_diff_files(struct rev_info *revs, unsigned int option)
continue;
}
- if (ce_uptodate(ce) || ce_skip_worktree(ce))
+ if (ce_uptodate(ce) || ce_skip_worktree(ce)) {
+ if (revs->diffopt.flags.find_copies_harder)
+ diff_same(&revs->diffopt, ce->ce_mode,
+ &ce->oid, ce->name);
continue;
+ }
/*
* When CE_VALID is set (via "update-index --assume-unchanged"
@@ -272,8 +276,10 @@ void run_diff_files(struct rev_info *revs, unsigned int option)
if (!changed && !dirty_submodule) {
ce_mark_uptodate(ce);
mark_fsmonitor_valid(istate, ce);
- if (!revs->diffopt.flags.find_copies_harder)
- continue;
+ if (revs->diffopt.flags.find_copies_harder)
+ diff_same(&revs->diffopt, newmode,
+ &ce->oid, ce->name);
+ continue;
}
oldmode = ce->ce_mode;
old_oid = &ce->oid;
diff --git a/t/t4007-rename-3.sh b/t/t4007-rename-3.sh
index 3fc81bcd76..1012a370dd 100755
--- a/t/t4007-rename-3.sh
+++ b/t/t4007-rename-3.sh
@@ -67,7 +67,28 @@ test_expect_success 'copy, limited to a subtree' '
'
test_expect_success 'tweak work tree' '
- rm -f path0/COPYING &&
+ rm -f path0/COPYING
+'
+
+cat >expected <<EOF
+:100644 100644 $blob $blob C100 path1/COPYING path0/COPYING
+EOF
+
+# The cache has path0/COPYING and path1/COPYING, the working tree only
+# path1/COPYING. This is a deletion -- we don't treat deduplication
+# specially. In reverse it should be detected as a copy, though.
+test_expect_success 'copy detection, files to index' '
+ git diff-files -C --find-copies-harder -R >current &&
+ compare_diff_raw current expected
+'
+
+test_expect_success 'copy detection, files to preloaded index' '
+ GIT_TEST_PRELOAD_INDEX=1 \
+ git diff-files -C --find-copies-harder -R >current &&
+ compare_diff_raw current expected
+'
+
+test_expect_success 'tweak index' '
git update-index --remove path0/COPYING
'
# In the tree, there is only path0/COPYING. In the cache, path0 does
--
2.52.0
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH RESEND] diff-files: fix copy detection
2025-12-14 15:57 [PATCH RESEND] diff-files: fix copy detection René Scharfe
@ 2025-12-15 23:02 ` SZEDER Gábor
2025-12-16 1:21 ` Junio C Hamano
0 siblings, 1 reply; 3+ messages in thread
From: SZEDER Gábor @ 2025-12-15 23:02 UTC (permalink / raw)
To: Junio C Hamano; +Cc: René Scharfe, Git List
On Sun, Dec 14, 2025 at 04:57:06PM +0100, René Scharfe wrote:
> Fix copy detection by queuing up-to-date and skip-worktree entries using
> diff_same().
> diff --git a/diff-lib.c b/diff-lib.c
> index 8e624f38c6..5307390ff3 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -272,8 +276,10 @@ void run_diff_files(struct rev_info *revs, unsigned int option)
> if (!changed && !dirty_submodule) {
> ce_mark_uptodate(ce);
> mark_fsmonitor_valid(istate, ce);
> - if (!revs->diffopt.flags.find_copies_harder)
> - continue;
> + if (revs->diffopt.flags.find_copies_harder)
> + diff_same(&revs->diffopt, newmode,
> + &ce->oid, ce->name);
Junio, this patch should be queued on top of 38f88051da
(diff-index: don't queue unchanged filepairs with diff_change(),
2025-11-30), because diff_same() was introduced in that commit.
~/src/git ((7077c385f9...) %)$ git log --oneline -1
7077c385f9 (HEAD) diff-files: fix copy detection
~/src/git ((7077c385f9...) %)$ make diff-lib.o
CC diff-lib.o
diff-lib.c: In function ‘run_diff_files’:
diff-lib.c:231:33: error: implicit declaration of function ‘diff_same’; did you mean ‘diff_free’? [-Werror=implicit-function-declaration]
231 | diff_same(&revs->diffopt, ce->ce_mode,
| ^~~~~~~~~
| diff_free
cc1: all warnings being treated as errors
make: *** [Makefile:2862: diff-lib.o] Error 1
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [PATCH RESEND] diff-files: fix copy detection
2025-12-15 23:02 ` SZEDER Gábor
@ 2025-12-16 1:21 ` Junio C Hamano
0 siblings, 0 replies; 3+ messages in thread
From: Junio C Hamano @ 2025-12-16 1:21 UTC (permalink / raw)
To: SZEDER Gábor; +Cc: René Scharfe, Git List
SZEDER Gábor <szeder.dev@gmail.com> writes:
> On Sun, Dec 14, 2025 at 04:57:06PM +0100, René Scharfe wrote:
>> Fix copy detection by queuing up-to-date and skip-worktree entries using
>> diff_same().
>
>> diff --git a/diff-lib.c b/diff-lib.c
>> index 8e624f38c6..5307390ff3 100644
>> --- a/diff-lib.c
>> +++ b/diff-lib.c
>
>> @@ -272,8 +276,10 @@ void run_diff_files(struct rev_info *revs, unsigned int option)
>> if (!changed && !dirty_submodule) {
>> ce_mark_uptodate(ce);
>> mark_fsmonitor_valid(istate, ce);
>> - if (!revs->diffopt.flags.find_copies_harder)
>> - continue;
>> + if (revs->diffopt.flags.find_copies_harder)
>> + diff_same(&revs->diffopt, newmode,
>> + &ce->oid, ce->name);
>
> Junio, this patch should be queued on top of 38f88051da
> (diff-index: don't queue unchanged filepairs with diff_change(),
> 2025-11-30), because diff_same() was introduced in that commit.
Very true. Thanks.
>
> ~/src/git ((7077c385f9...) %)$ git log --oneline -1
> 7077c385f9 (HEAD) diff-files: fix copy detection
> ~/src/git ((7077c385f9...) %)$ make diff-lib.o
> CC diff-lib.o
> diff-lib.c: In function ‘run_diff_files’:
> diff-lib.c:231:33: error: implicit declaration of function ‘diff_same’; did you mean ‘diff_free’? [-Werror=implicit-function-declaration]
> 231 | diff_same(&revs->diffopt, ce->ce_mode,
> | ^~~~~~~~~
> | diff_free
> cc1: all warnings being treated as errors
> make: *** [Makefile:2862: diff-lib.o] Error 1
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-12-16 1:21 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-14 15:57 [PATCH RESEND] diff-files: fix copy detection René Scharfe
2025-12-15 23:02 ` SZEDER Gábor
2025-12-16 1:21 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).