From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Elijah Newren <newren@gmail.com>, Elijah Newren <newren@gmail.com>
Subject: [PATCH] diffcore-delta: avoid ignoring final 'line' of file
Date: Thu, 11 Jan 2024 20:47:54 +0000 [thread overview]
Message-ID: <pull.1637.git.1705006074626.gitgitgadget@gmail.com> (raw)
From: Elijah Newren <newren@gmail.com>
hash_chars() would hash lines to integers, and store them in a spanhash,
but cut lines at 64 characters. Thus, whenever it reached 64 characters
or a newline, it would create a new spanhash. The problem is, the final
part of the file might not end 64 characters after the previous 'line'
and might not end with a newline. This could, for example, cause an
85-byte file with 12 lines and only the first character in the file
differing to appear merely 23% similar rather than the expected 97%.
Ensure the last line is included, and add a testcase that would have
caught this problem.
Signed-off-by: Elijah Newren <newren@gmail.com>
---
diffcore-delta: avoid ignoring final 'line' of file
Found while experimenting with converting portions of diffcore-delta to
Rust.
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1637%2Fnewren%2Ffix-diffcore-final-line-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1637/newren/fix-diffcore-final-line-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1637
diffcore-delta.c | 4 ++++
t/t4001-diff-rename.sh | 19 +++++++++++++++++++
2 files changed, 23 insertions(+)
diff --git a/diffcore-delta.c b/diffcore-delta.c
index c30b56e983b..7136c3dd203 100644
--- a/diffcore-delta.c
+++ b/diffcore-delta.c
@@ -159,6 +159,10 @@ static struct spanhash_top *hash_chars(struct repository *r,
n = 0;
accum1 = accum2 = 0;
}
+ if (n > 0) {
+ hashval = (accum1 + accum2 * 0x61) % HASHBASE;
+ hash = add_spanhash(hash, hashval, n);
+ }
QSORT(hash->data, (size_t)1ul << hash->alloc_log2, spanhash_cmp);
return hash;
}
diff --git a/t/t4001-diff-rename.sh b/t/t4001-diff-rename.sh
index 85be1367de6..29299acbce7 100755
--- a/t/t4001-diff-rename.sh
+++ b/t/t4001-diff-rename.sh
@@ -286,4 +286,23 @@ test_expect_success 'basename similarity vs best similarity' '
test_cmp expected actual
'
+test_expect_success 'last line matters too' '
+ test_write_lines a 0 1 2 3 4 5 6 7 8 9 >nonewline &&
+ printf "git ignores final up to 63 characters if not newline terminated" >>nonewline &&
+ git add nonewline &&
+ git commit -m "original version of file with no final newline" &&
+
+ # Change ONLY the first character of the whole file
+ test_write_lines b 0 1 2 3 4 5 6 7 8 9 >nonewline &&
+ printf "git ignores final up to 63 characters if not newline terminated" >>nonewline &&
+ git add nonewline &&
+ git mv nonewline still-no-newline &&
+ git commit -a -m "rename nonewline -> still-no-newline" &&
+ git diff-tree -r -M01 --name-status HEAD^ HEAD >actual &&
+ cat >expected <<-\EOF &&
+ R097 nonewline still-no-newline
+ EOF
+ test_cmp expected actual
+'
+
test_done
base-commit: 055bb6e9969085777b7fab83e3fee0017654f134
--
gitgitgadget
next reply other threads:[~2024-01-11 20:47 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-11 20:47 Elijah Newren via GitGitGadget [this message]
2024-01-11 21:45 ` [PATCH] diffcore-delta: avoid ignoring final 'line' of file Taylor Blau
2024-01-11 23:00 ` Junio C Hamano
2024-01-13 1:45 ` Elijah Newren
2024-01-13 6:21 ` Junio C Hamano
2024-01-19 1:54 ` Elijah Newren
2024-01-19 3:06 ` Junio C Hamano
2024-01-19 5:05 ` Elijah Newren
2024-01-19 6:27 ` Junio C Hamano
2024-01-13 4:26 ` [PATCH v2] " Elijah Newren via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.1637.git.1705006074626.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=git@vger.kernel.org \
--cc=newren@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.