git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Duy Nguyen <pclouds@gmail.com>,
	"Shawn O. Pearce" <spearce@spearce.org>,
	Nicolas Pitre <nico@fluxnic.net>
Subject: [PATCHv3 0/6] duplicate objects and delta cycles
Date: Fri, 23 Aug 2013 20:01:11 -0400	[thread overview]
Message-ID: <20130824000111.GA20255@sigill.intra.peff.net> (raw)
In-Reply-To: <20130823182409.GA30130@sigill.intra.peff.net>

On Fri, Aug 23, 2013 at 02:24:10PM -0400, Jeff King wrote:

> > This is somewhat tricky and may deserve an in-code comment.
> 
> Do you want me to re-roll[...]

Since there were a few things to fix, I went ahead and re-rolled. Here's
v3, which changes:

  1. Move "edge of run" logic description into the in-code comment
     rather than the commit message.

  2. Include space between shell function names and parentheses.

  3. Use $TEST_DIRECTORY to find lib-pack so that "--root" works.

I added in Nicolas's acks, too, as this version makes no changes of
substance from the previous ack'd version. Interdiff is below.

 sha1-lookup.c                     | 24 ++++++++++++++++++++++++
 t/lib-pack.sh                     | 12 ++++++------
 t/t5308-pack-detect-duplicates.sh |  4 ++--
 t/t5309-pack-delta-cycles.sh      |  2 +-
 4 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/sha1-lookup.c b/sha1-lookup.c
index 614cbb6..2dd8515 100644
--- a/sha1-lookup.c
+++ b/sha1-lookup.c
@@ -215,6 +215,30 @@ int sha1_entry_pos(const void *table,
 			 * Either one of these entries is it (and
 			 * we do not care which), or we do not have
 			 * it.
+			 *
+			 * Furthermore, we know that one of our
+			 * endpoints must be the edge of the run of
+			 * duplicates. For example, given this
+			 * sequence:
+			 *
+			 *     idx 0 1 2 3 4 5
+			 *     key A C C C C D
+			 *
+			 * If we are searching for "B", we might
+			 * hit the duplicate run at lo=1, hi=3
+			 * (e.g., by first mi=3, then mi=0). But we
+			 * can never have lo > 1, because B < C.
+			 * That is, if our key is less than the
+			 * run, we know that "lo" is the edge, but
+			 * we can say nothing of "hi". Similarly,
+			 * if our key is greater than the run, we
+			 * know that "hi" is the edge, but we can
+			 * say nothing of "lo".
+			 *
+			 * Therefore if we do not find it, we also
+			 * know where it would go if it did exist:
+			 * just on the far side of the edge that we
+			 * know about.
 			 */
 			if (ofs == 20) {
 				mi = lo;
diff --git a/t/lib-pack.sh b/t/lib-pack.sh
index e028c40..7e8685b 100644
--- a/t/lib-pack.sh
+++ b/t/lib-pack.sh
@@ -10,7 +10,7 @@
 #   pack_trailer foo.pack
 
 # Print the big-endian 4-byte octal representation of $1
-uint32_octal() {
+uint32_octal () {
 	n=$1
 	printf '\%o' $(($n / 16777216)); n=$((n % 16777216))
 	printf '\%o' $(($n /    65536)); n=$((n %    65536))
@@ -19,12 +19,12 @@ uint32_binary() {
 }
 
 # Print the big-endian 4-byte binary representation of $1
-uint32_binary() {
+uint32_binary () {
 	printf "$(uint32_octal "$1")"
 }
 
 # Print a pack header, version 2, for a pack with $1 objects
-pack_header() {
+pack_header () {
 	printf 'PACK' &&
 	printf '\0\0\0\2' &&
 	uint32_binary "$1"
@@ -36,7 +36,7 @@ pack_header() {
 # Doing this on the fly (especially picking your deltas) is quite tricky, so we
 # have hardcoded some well-known objects. See the case statements below for the
 # complete list.
-pack_obj() {
+pack_obj () {
 	case "$1" in
 	# empty blob
 	e69de29bb2d1d6434b8b29ae775ad8c2e48c5391)
@@ -86,7 +86,7 @@ pack_obj() {
 }
 
 # Compute and append pack trailer to "$1"
-pack_trailer() {
+pack_trailer () {
 	test-sha1 -b <"$1" >trailer.tmp &&
 	cat trailer.tmp >>"$1" &&
 	rm -f trailer.tmp
@@ -95,6 +95,6 @@ pack_trailer() {
 # Remove any existing packs to make sure that
 # whatever we index next will be the pack that we
 # actually use.
-clear_packs() {
+clear_packs () {
 	rm -f .git/objects/pack/*
 }
diff --git a/t/t5308-pack-detect-duplicates.sh b/t/t5308-pack-detect-duplicates.sh
index e982095..e0ba5e1 100755
--- a/t/t5308-pack-detect-duplicates.sh
+++ b/t/t5308-pack-detect-duplicates.sh
@@ -2,7 +2,7 @@ test_description='handling of duplicate objects in incoming packfiles'
 
 test_description='handling of duplicate objects in incoming packfiles'
 . ./test-lib.sh
-. ../lib-pack.sh
+. "$TEST_DIRECTORY"/lib-pack.sh
 
 # The sha1s we have in our pack. It's important that these have the same
 # starting byte, so that they end up in the same fanout section of the index.
@@ -22,7 +22,7 @@ MISSING_SHA1='e69d000000000000000000000000000000000000'
 # $1 is the name of the packfile to create
 #
 # $2 is the number of times to duplicate each object
-create_pack() {
+create_pack () {
 	pack_header "$((2 * $2))" >"$1" &&
 	for i in $(test_seq 1 "$2"); do
 		pack_obj $LO_SHA1 &&
diff --git a/t/t5309-pack-delta-cycles.sh b/t/t5309-pack-delta-cycles.sh
index 38e1809..3e7861b 100755
--- a/t/t5309-pack-delta-cycles.sh
+++ b/t/t5309-pack-delta-cycles.sh
@@ -2,7 +2,7 @@ test_description='test index-pack handling of delta cycles in packfiles'
 
 test_description='test index-pack handling of delta cycles in packfiles'
 . ./test-lib.sh
-. ../lib-pack.sh
+. "$TEST_DIRECTORY"/lib-pack.sh
 
 # Two similar-ish objects that we have computed deltas between.
 A=01d7713666f4de822776c7622c10f1b07de280dc

  parent reply	other threads:[~2013-08-24  0:01 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-14 18:17 duplicate objects in packfile Jeff King
2013-08-14 18:29 ` Junio C Hamano
2013-08-14 18:39   ` Junio C Hamano
2013-08-14 18:54     ` Nicolas Pitre
2013-08-16 15:01     ` Jeff King
2013-08-21 20:49       ` [RFC/PATCH 0/4] duplicate objects in packfiles Jeff King
2013-08-21 20:51         ` [PATCH 1/4] sha1-lookup: handle duplicate keys with GIT_USE_LOOKUP Jeff King
2013-08-21 20:52         ` [PATCH 2/4] index-pack: optionally reject packs with duplicate objects Jeff King
2013-08-22 13:45           ` Duy Nguyen
2013-08-22 14:43             ` Jeff King
2013-08-22 23:12               ` [PATCHv2 0/6] duplicate objects and delta cycles, oh my! Jeff King
2013-08-22 23:12                 ` [PATCH 1/6] test-sha1: add a binary output mode Jeff King
2013-08-22 23:14                 ` [PATCH 2/6] sha1-lookup: handle duplicate keys with GIT_USE_LOOKUP Jeff King
2013-08-23 16:41                   ` Junio C Hamano
2013-08-23 18:24                     ` Jeff King
2013-08-23 18:54                       ` Nicolas Pitre
2013-08-23 18:56                         ` Jeff King
2013-08-24  0:01                       ` Jeff King [this message]
2013-08-24  0:01                         ` [PATCH 1/6] test-sha1: add a binary output mode Jeff King
2013-08-24  0:02                         ` [PATCH 2/6] sha1-lookup: handle duplicate keys with GIT_USE_LOOKUP Jeff King
2013-08-24  0:02                         ` [PATCH 3/6] add tests for indexing packs with delta cycles Jeff King
2013-08-24  0:02                         ` [PATCH 4/6] test index-pack on packs with recoverable " Jeff King
2013-08-24  0:02                         ` [PATCH 5/6] index-pack: optionally reject packs with duplicate objects Jeff King
2013-08-24  0:02                         ` [PATCH 6/6] default pack.indexDuplicates to false Jeff King
2013-08-23 19:41                   ` [PATCH 2/6] sha1-lookup: handle duplicate keys with GIT_USE_LOOKUP Johannes Sixt
2013-08-23 23:37                     ` Jeff King
2013-08-22 23:14                 ` [PATCH 3/6] add tests for indexing packs with delta cycles Jeff King
2013-08-22 23:15                 ` [PATCH 4/6] test index-pack on packs with recoverable " Jeff King
2013-08-22 23:15                 ` [PATCH 5/6] index-pack: optionally reject packs with duplicate objects Jeff King
2013-08-22 23:16                 ` [PATCH 6/6] default pack.indexDuplicates to false Jeff King
2013-08-22 23:35                 ` [PATCHv2 0/6] duplicate objects and delta cycles, oh my! Nicolas Pitre
2013-08-21 20:53         ` [PATCH 3/4] reject duplicates when indexing a packfile we created Jeff King
2013-08-21 20:55         ` [DO NOT APPLY PATCH 4/4] index-pack: optionally skip duplicate packfile entries Jeff King
2013-08-21 23:20           ` Junio C Hamano
2013-08-22  0:47             ` Jeff King
2013-08-21 22:17         ` [RFC/PATCH 0/4] duplicate objects in packfiles Junio C Hamano
2013-08-14 18:50 ` duplicate objects in packfile Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130824000111.GA20255@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=nico@fluxnic.net \
    --cc=pclouds@gmail.com \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).