From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Git Mailing List <git@vger.kernel.org>,
Duy Nguyen <pclouds@gmail.com>,
"Shawn O. Pearce" <spearce@spearce.org>,
Nicolas Pitre <nico@fluxnic.net>
Subject: [PATCHv3 0/6] duplicate objects and delta cycles
Date: Fri, 23 Aug 2013 20:01:11 -0400 [thread overview]
Message-ID: <20130824000111.GA20255@sigill.intra.peff.net> (raw)
In-Reply-To: <20130823182409.GA30130@sigill.intra.peff.net>
On Fri, Aug 23, 2013 at 02:24:10PM -0400, Jeff King wrote:
> > This is somewhat tricky and may deserve an in-code comment.
>
> Do you want me to re-roll[...]
Since there were a few things to fix, I went ahead and re-rolled. Here's
v3, which changes:
1. Move "edge of run" logic description into the in-code comment
rather than the commit message.
2. Include space between shell function names and parentheses.
3. Use $TEST_DIRECTORY to find lib-pack so that "--root" works.
I added in Nicolas's acks, too, as this version makes no changes of
substance from the previous ack'd version. Interdiff is below.
sha1-lookup.c | 24 ++++++++++++++++++++++++
t/lib-pack.sh | 12 ++++++------
t/t5308-pack-detect-duplicates.sh | 4 ++--
t/t5309-pack-delta-cycles.sh | 2 +-
4 files changed, 33 insertions(+), 9 deletions(-)
diff --git a/sha1-lookup.c b/sha1-lookup.c
index 614cbb6..2dd8515 100644
--- a/sha1-lookup.c
+++ b/sha1-lookup.c
@@ -215,6 +215,30 @@ int sha1_entry_pos(const void *table,
* Either one of these entries is it (and
* we do not care which), or we do not have
* it.
+ *
+ * Furthermore, we know that one of our
+ * endpoints must be the edge of the run of
+ * duplicates. For example, given this
+ * sequence:
+ *
+ * idx 0 1 2 3 4 5
+ * key A C C C C D
+ *
+ * If we are searching for "B", we might
+ * hit the duplicate run at lo=1, hi=3
+ * (e.g., by first mi=3, then mi=0). But we
+ * can never have lo > 1, because B < C.
+ * That is, if our key is less than the
+ * run, we know that "lo" is the edge, but
+ * we can say nothing of "hi". Similarly,
+ * if our key is greater than the run, we
+ * know that "hi" is the edge, but we can
+ * say nothing of "lo".
+ *
+ * Therefore if we do not find it, we also
+ * know where it would go if it did exist:
+ * just on the far side of the edge that we
+ * know about.
*/
if (ofs == 20) {
mi = lo;
diff --git a/t/lib-pack.sh b/t/lib-pack.sh
index e028c40..7e8685b 100644
--- a/t/lib-pack.sh
+++ b/t/lib-pack.sh
@@ -10,7 +10,7 @@
# pack_trailer foo.pack
# Print the big-endian 4-byte octal representation of $1
-uint32_octal() {
+uint32_octal () {
n=$1
printf '\%o' $(($n / 16777216)); n=$((n % 16777216))
printf '\%o' $(($n / 65536)); n=$((n % 65536))
@@ -19,12 +19,12 @@ uint32_binary() {
}
# Print the big-endian 4-byte binary representation of $1
-uint32_binary() {
+uint32_binary () {
printf "$(uint32_octal "$1")"
}
# Print a pack header, version 2, for a pack with $1 objects
-pack_header() {
+pack_header () {
printf 'PACK' &&
printf '\0\0\0\2' &&
uint32_binary "$1"
@@ -36,7 +36,7 @@ pack_header() {
# Doing this on the fly (especially picking your deltas) is quite tricky, so we
# have hardcoded some well-known objects. See the case statements below for the
# complete list.
-pack_obj() {
+pack_obj () {
case "$1" in
# empty blob
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391)
@@ -86,7 +86,7 @@ pack_obj() {
}
# Compute and append pack trailer to "$1"
-pack_trailer() {
+pack_trailer () {
test-sha1 -b <"$1" >trailer.tmp &&
cat trailer.tmp >>"$1" &&
rm -f trailer.tmp
@@ -95,6 +95,6 @@ pack_trailer() {
# Remove any existing packs to make sure that
# whatever we index next will be the pack that we
# actually use.
-clear_packs() {
+clear_packs () {
rm -f .git/objects/pack/*
}
diff --git a/t/t5308-pack-detect-duplicates.sh b/t/t5308-pack-detect-duplicates.sh
index e982095..e0ba5e1 100755
--- a/t/t5308-pack-detect-duplicates.sh
+++ b/t/t5308-pack-detect-duplicates.sh
@@ -2,7 +2,7 @@ test_description='handling of duplicate objects in incoming packfiles'
test_description='handling of duplicate objects in incoming packfiles'
. ./test-lib.sh
-. ../lib-pack.sh
+. "$TEST_DIRECTORY"/lib-pack.sh
# The sha1s we have in our pack. It's important that these have the same
# starting byte, so that they end up in the same fanout section of the index.
@@ -22,7 +22,7 @@ MISSING_SHA1='e69d000000000000000000000000000000000000'
# $1 is the name of the packfile to create
#
# $2 is the number of times to duplicate each object
-create_pack() {
+create_pack () {
pack_header "$((2 * $2))" >"$1" &&
for i in $(test_seq 1 "$2"); do
pack_obj $LO_SHA1 &&
diff --git a/t/t5309-pack-delta-cycles.sh b/t/t5309-pack-delta-cycles.sh
index 38e1809..3e7861b 100755
--- a/t/t5309-pack-delta-cycles.sh
+++ b/t/t5309-pack-delta-cycles.sh
@@ -2,7 +2,7 @@ test_description='test index-pack handling of delta cycles in packfiles'
test_description='test index-pack handling of delta cycles in packfiles'
. ./test-lib.sh
-. ../lib-pack.sh
+. "$TEST_DIRECTORY"/lib-pack.sh
# Two similar-ish objects that we have computed deltas between.
A=01d7713666f4de822776c7622c10f1b07de280dc
next prev parent reply other threads:[~2013-08-24 0:01 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-14 18:17 duplicate objects in packfile Jeff King
2013-08-14 18:29 ` Junio C Hamano
2013-08-14 18:39 ` Junio C Hamano
2013-08-14 18:54 ` Nicolas Pitre
2013-08-16 15:01 ` Jeff King
2013-08-21 20:49 ` [RFC/PATCH 0/4] duplicate objects in packfiles Jeff King
2013-08-21 20:51 ` [PATCH 1/4] sha1-lookup: handle duplicate keys with GIT_USE_LOOKUP Jeff King
2013-08-21 20:52 ` [PATCH 2/4] index-pack: optionally reject packs with duplicate objects Jeff King
2013-08-22 13:45 ` Duy Nguyen
2013-08-22 14:43 ` Jeff King
2013-08-22 23:12 ` [PATCHv2 0/6] duplicate objects and delta cycles, oh my! Jeff King
2013-08-22 23:12 ` [PATCH 1/6] test-sha1: add a binary output mode Jeff King
2013-08-22 23:14 ` [PATCH 2/6] sha1-lookup: handle duplicate keys with GIT_USE_LOOKUP Jeff King
2013-08-23 16:41 ` Junio C Hamano
2013-08-23 18:24 ` Jeff King
2013-08-23 18:54 ` Nicolas Pitre
2013-08-23 18:56 ` Jeff King
2013-08-24 0:01 ` Jeff King [this message]
2013-08-24 0:01 ` [PATCH 1/6] test-sha1: add a binary output mode Jeff King
2013-08-24 0:02 ` [PATCH 2/6] sha1-lookup: handle duplicate keys with GIT_USE_LOOKUP Jeff King
2013-08-24 0:02 ` [PATCH 3/6] add tests for indexing packs with delta cycles Jeff King
2013-08-24 0:02 ` [PATCH 4/6] test index-pack on packs with recoverable " Jeff King
2013-08-24 0:02 ` [PATCH 5/6] index-pack: optionally reject packs with duplicate objects Jeff King
2013-08-24 0:02 ` [PATCH 6/6] default pack.indexDuplicates to false Jeff King
2013-08-23 19:41 ` [PATCH 2/6] sha1-lookup: handle duplicate keys with GIT_USE_LOOKUP Johannes Sixt
2013-08-23 23:37 ` Jeff King
2013-08-22 23:14 ` [PATCH 3/6] add tests for indexing packs with delta cycles Jeff King
2013-08-22 23:15 ` [PATCH 4/6] test index-pack on packs with recoverable " Jeff King
2013-08-22 23:15 ` [PATCH 5/6] index-pack: optionally reject packs with duplicate objects Jeff King
2013-08-22 23:16 ` [PATCH 6/6] default pack.indexDuplicates to false Jeff King
2013-08-22 23:35 ` [PATCHv2 0/6] duplicate objects and delta cycles, oh my! Nicolas Pitre
2013-08-21 20:53 ` [PATCH 3/4] reject duplicates when indexing a packfile we created Jeff King
2013-08-21 20:55 ` [DO NOT APPLY PATCH 4/4] index-pack: optionally skip duplicate packfile entries Jeff King
2013-08-21 23:20 ` Junio C Hamano
2013-08-22 0:47 ` Jeff King
2013-08-21 22:17 ` [RFC/PATCH 0/4] duplicate objects in packfiles Junio C Hamano
2013-08-14 18:50 ` duplicate objects in packfile Nicolas Pitre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130824000111.GA20255@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=nico@fluxnic.net \
--cc=pclouds@gmail.com \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).