All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] diffcore-delta: avoid ignoring final 'line' of file
@ 2024-01-11 20:47 Elijah Newren via GitGitGadget
  2024-01-11 21:45 ` Taylor Blau
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Elijah Newren via GitGitGadget @ 2024-01-11 20:47 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

hash_chars() would hash lines to integers, and store them in a spanhash,
but cut lines at 64 characters.  Thus, whenever it reached 64 characters
or a newline, it would create a new spanhash.  The problem is, the final
part of the file might not end 64 characters after the previous 'line'
and might not end with a newline.  This could, for example, cause an
85-byte file with 12 lines and only the first character in the file
differing to appear merely 23% similar rather than the expected 97%.
Ensure the last line is included, and add a testcase that would have
caught this problem.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
    diffcore-delta: avoid ignoring final 'line' of file
    
    Found while experimenting with converting portions of diffcore-delta to
    Rust.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1637%2Fnewren%2Ffix-diffcore-final-line-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1637/newren/fix-diffcore-final-line-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1637

 diffcore-delta.c       |  4 ++++
 t/t4001-diff-rename.sh | 19 +++++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/diffcore-delta.c b/diffcore-delta.c
index c30b56e983b..7136c3dd203 100644
--- a/diffcore-delta.c
+++ b/diffcore-delta.c
@@ -159,6 +159,10 @@ static struct spanhash_top *hash_chars(struct repository *r,
 		n = 0;
 		accum1 = accum2 = 0;
 	}
+	if (n > 0) {
+		hashval = (accum1 + accum2 * 0x61) % HASHBASE;
+		hash = add_spanhash(hash, hashval, n);
+	}
 	QSORT(hash->data, (size_t)1ul << hash->alloc_log2, spanhash_cmp);
 	return hash;
 }
diff --git a/t/t4001-diff-rename.sh b/t/t4001-diff-rename.sh
index 85be1367de6..29299acbce7 100755
--- a/t/t4001-diff-rename.sh
+++ b/t/t4001-diff-rename.sh
@@ -286,4 +286,23 @@ test_expect_success 'basename similarity vs best similarity' '
 	test_cmp expected actual
 '
 
+test_expect_success 'last line matters too' '
+	test_write_lines a 0 1 2 3 4 5 6 7 8 9 >nonewline &&
+	printf "git ignores final up to 63 characters if not newline terminated" >>nonewline &&
+	git add nonewline &&
+	git commit -m "original version of file with no final newline" &&
+
+	# Change ONLY the first character of the whole file
+	test_write_lines b 0 1 2 3 4 5 6 7 8 9 >nonewline &&
+	printf "git ignores final up to 63 characters if not newline terminated" >>nonewline &&
+	git add nonewline &&
+	git mv nonewline still-no-newline &&
+	git commit -a -m "rename nonewline -> still-no-newline" &&
+	git diff-tree -r -M01 --name-status HEAD^ HEAD >actual &&
+	cat >expected <<-\EOF &&
+	R097	nonewline	still-no-newline
+	EOF
+	test_cmp expected actual
+'
+
 test_done

base-commit: 055bb6e9969085777b7fab83e3fee0017654f134
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-01-19  6:27 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-11 20:47 [PATCH] diffcore-delta: avoid ignoring final 'line' of file Elijah Newren via GitGitGadget
2024-01-11 21:45 ` Taylor Blau
2024-01-11 23:00 ` Junio C Hamano
2024-01-13  1:45   ` Elijah Newren
2024-01-13  6:21     ` Junio C Hamano
2024-01-19  1:54       ` Elijah Newren
2024-01-19  3:06         ` Junio C Hamano
2024-01-19  5:05           ` Elijah Newren
2024-01-19  6:27             ` Junio C Hamano
2024-01-13  4:26 ` [PATCH v2] " Elijah Newren via GitGitGadget

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.