All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>
Cc: Git Mailing List <git@vger.kernel.org>,
	Duy Nguyen <pclouds@gmail.com>,
	"Shawn O. Pearce" <spearce@spearce.org>,
	Nicolas Pitre <nico@fluxnic.net>
Subject: Re: [PATCH 2/6] sha1-lookup: handle duplicate keys with GIT_USE_LOOKUP
Date: Fri, 23 Aug 2013 09:41:57 -0700	[thread overview]
Message-ID: <xmqqob8opdey.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <20130822231404.GB17060@sigill.intra.peff.net> (Jeff King's message of "Thu, 22 Aug 2013 19:14:04 -0400")

Jeff King <peff@peff.net> writes:

> Furthermore, we know that one of our endpoints must be
> the edge of the run of duplicates. For example, given this
> sequence:
>
>  idx 0 1 2 3 4 5
>  key A C C C C D
>
> If we are searching for "B", we might hit the duplicate run
> at lo=1, hi=3 (e.g., by first mi=3, then mi=0). But we can
> never have lo > 1, because B < C. That is, if our key is
> less than the run, we know that "lo" is the edge, but we can
> say nothing of "hi". Similarly, if our key is greater than
> the run, we know that "hi" is the edge, but we can say
> nothing of "lo". But that is enough for us to return not
> only "not found", but show the position at which we would
> insert the new item.

This is somewhat tricky and may deserve an in-code comment.

> diff --git a/sha1-lookup.c b/sha1-lookup.c
> index c4dc55d..614cbb6 100644
> --- a/sha1-lookup.c
> +++ b/sha1-lookup.c
> @@ -204,7 +204,30 @@ int sha1_entry_pos(const void *table,
>  			 * byte 0 thru (ofs-1) are the same between
>  			 * lo and hi; ofs is the first byte that is
>  			 * different.
> +			 *
> +			 * If ofs==20, then no bytes are different,
> +			 * meaning we have entries with duplicate
> +			 * keys. We know that we are in a solid run
> +			 * of this entry (because the entries are
> +			 * sorted, and our lo and hi are the same,
> +			 * there can be nothing but this single key
> +			 * in between). So we can stop the search.
> +			 * Either one of these entries is it (and
> +			 * we do not care which), or we do not have
> +			 * it.
>  			 */
> +			if (ofs == 20) {
> +				mi = lo;
> +				mi_key = base + elem_size * mi + key_offset;
> +				cmp = memcmp(mi_key, key, 20);

It think we already know that mi_key[0:ofs_0] and key[0:ofs_0] are
the same at this point and we do not have to compare full 20 bytes
again, but this is done only once and a better readablity of the
above trumps micro-optimization possibility, I think.

> +				if (!cmp)
> +					return mi;
> +				if (cmp < 0)
> +					return -1 - hi;
> +				else
> +					return -1 - lo;
> +			}
> +
>  			hiv = hi_key[ofs_0];
>  			if (ofs_0 < 19)
>  				hiv = (hiv << 8) | hi_key[ofs_0+1];
> diff --git a/t/lib-pack.sh b/t/lib-pack.sh
> new file mode 100644
> index 0000000..61c5d19
> --- /dev/null
> +++ b/t/lib-pack.sh
> @@ -0,0 +1,78 @@
> +#!/bin/sh
> +#
> +# Support routines for hand-crafting weird or malicious packs.
> +#
> +# You can make a complete pack like:
> +#
> +#   pack_header 2 >foo.pack &&
> +#   pack_obj e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 >>foo.pack &&
> +#   pack_obj e68fe8129b546b101aee9510c5328e7f21ca1d18 >>foo.pack &&
> +#   pack_trailer foo.pack
> +
> +# Print the big-endian 4-byte octal representation of $1
> +uint32_octal() {

micronit (style):

	uint32_octal () {

> +	n=$1
> +	printf '\%o' $(($n / 16777216)); n=$((n % 16777216))
> +	printf '\%o' $(($n /    65536)); n=$((n %    65536))
> +	printf '\%o' $(($n /      256)); n=$((n %      256))
> +	printf '\%o' $(($n           ));
> +}
> +
> +# Print the big-endian 4-byte binary representation of $1
> +uint32_binary() {
> +	printf "$(uint32_octal "$1")"
> +}
> +
> +# Print a pack header, version 2, for a pack with $1 objects
> +pack_header() {
> +	printf 'PACK' &&
> +	printf '\0\0\0\2' &&
> +	uint32_binary "$1"
> +}
> +
> +# Print the pack data for object $1, as a delta against object $2 (or as a full
> +# object if $2 is missing or empty). The output is suitable for including
> +# directly in the packfile, and represents the entirety of the object entry.
> +# Doing this on the fly (especially picking your deltas) is quite tricky, so we
> +# have hardcoded some well-known objects. See the case statements below for the
> +# complete list.

Cute ;-) I like the idea of having this function with a right API in
place, and cheating by limiting its implementation to what is
necessary.

Thanks.

  reply	other threads:[~2013-08-23 16:42 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-14 18:17 duplicate objects in packfile Jeff King
2013-08-14 18:29 ` Junio C Hamano
2013-08-14 18:39   ` Junio C Hamano
2013-08-14 18:54     ` Nicolas Pitre
2013-08-16 15:01     ` Jeff King
2013-08-21 20:49       ` [RFC/PATCH 0/4] duplicate objects in packfiles Jeff King
2013-08-21 20:51         ` [PATCH 1/4] sha1-lookup: handle duplicate keys with GIT_USE_LOOKUP Jeff King
2013-08-21 20:52         ` [PATCH 2/4] index-pack: optionally reject packs with duplicate objects Jeff King
2013-08-22 13:45           ` Duy Nguyen
2013-08-22 14:43             ` Jeff King
2013-08-22 23:12               ` [PATCHv2 0/6] duplicate objects and delta cycles, oh my! Jeff King
2013-08-22 23:12                 ` [PATCH 1/6] test-sha1: add a binary output mode Jeff King
2013-08-22 23:14                 ` [PATCH 2/6] sha1-lookup: handle duplicate keys with GIT_USE_LOOKUP Jeff King
2013-08-23 16:41                   ` Junio C Hamano [this message]
2013-08-23 18:24                     ` Jeff King
2013-08-23 18:54                       ` Nicolas Pitre
2013-08-23 18:56                         ` Jeff King
2013-08-24  0:01                       ` [PATCHv3 0/6] duplicate objects and delta cycles Jeff King
2013-08-24  0:01                         ` [PATCH 1/6] test-sha1: add a binary output mode Jeff King
2013-08-24  0:02                         ` [PATCH 2/6] sha1-lookup: handle duplicate keys with GIT_USE_LOOKUP Jeff King
2013-08-24  0:02                         ` [PATCH 3/6] add tests for indexing packs with delta cycles Jeff King
2013-08-24  0:02                         ` [PATCH 4/6] test index-pack on packs with recoverable " Jeff King
2013-08-24  0:02                         ` [PATCH 5/6] index-pack: optionally reject packs with duplicate objects Jeff King
2013-08-24  0:02                         ` [PATCH 6/6] default pack.indexDuplicates to false Jeff King
2013-08-23 19:41                   ` [PATCH 2/6] sha1-lookup: handle duplicate keys with GIT_USE_LOOKUP Johannes Sixt
2013-08-23 23:37                     ` Jeff King
2013-08-22 23:14                 ` [PATCH 3/6] add tests for indexing packs with delta cycles Jeff King
2013-08-22 23:15                 ` [PATCH 4/6] test index-pack on packs with recoverable " Jeff King
2013-08-22 23:15                 ` [PATCH 5/6] index-pack: optionally reject packs with duplicate objects Jeff King
2013-08-22 23:16                 ` [PATCH 6/6] default pack.indexDuplicates to false Jeff King
2013-08-22 23:35                 ` [PATCHv2 0/6] duplicate objects and delta cycles, oh my! Nicolas Pitre
2013-08-21 20:53         ` [PATCH 3/4] reject duplicates when indexing a packfile we created Jeff King
2013-08-21 20:55         ` [DO NOT APPLY PATCH 4/4] index-pack: optionally skip duplicate packfile entries Jeff King
2013-08-21 23:20           ` Junio C Hamano
2013-08-22  0:47             ` Jeff King
2013-08-21 22:17         ` [RFC/PATCH 0/4] duplicate objects in packfiles Junio C Hamano
2013-08-14 18:50 ` duplicate objects in packfile Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqob8opdey.fsf@gitster.dls.corp.google.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=nico@fluxnic.net \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.