From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, "Shawn O. Pearce" <spearce@spearce.org>,
Nicolas Pitre <nico@fluxnic.net>
Subject: Re: [DO NOT APPLY PATCH 4/4] index-pack: optionally skip duplicate packfile entries
Date: Wed, 21 Aug 2013 16:20:07 -0700 [thread overview]
Message-ID: <xmqqob8qskbc.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <20130821205555.GD28165@sigill.intra.peff.net> (Jeff King's message of "Wed, 21 Aug 2013 16:55:55 -0400")
Jeff King <peff@peff.net> writes:
> When we are building the pack index, we can notice that
> there are duplicate objects, pick one "winner" instance, and
> mention the object only once in the index (mapped to the
> winner's offset).
>
> This has the effect that the duplicate packfile entries are
> never found by lookup. The data still exists in the
> packfile, though, and can be used as a delta base if delta
> base offsets are used. If delta refs are used, then it is
> possible that some deltas may be broken.
I do not understand the last bit. If two copies of an object exist
but you have only one slot for the object in the index, and another
object names it as its base with ref-delta, then reconstituting it
should work just fine---whichever representation of the base object
is recorded in the .idx, that first needs to be reconstituted before
the delta is applied to it, and both copies should yield identical
contents for the delta base object, no?
In any case, ejecting one from the pack .idx would not help in the
presense of either broken or malicious packer that reuses delta too
aggressively. Suppose you have objects A and B and somehow manage
to create a cycle of deltas, A names B as its delta base and B names
A as its delta base. The packer may notice its mistake and then add
another copy of A as a base object. The pack contains two copies of
A (one is delta based on B, the other is full) and B (delta against
A).
If B refers to the copy of A that is delta against B using ofs-delta,
fixing the .idx file will have no effect. read_packed_sha1(B) will
read the delta data of B, finds the offset to start reading the data
for A which was excised from the .idx and unless that codepath is
updated to consult revindex (i.e. you mark one copy of A in the .pack
as bad, and when B refers to that bad copy of A via ofs-delta, you
check the offset against revindex to get the object name of A and go
to the good copy of A), you will never finish reading B because
reading the bad copy of A will lead you to first reconstitute B.
> I think this line of "fixing" should probably be scrapped.
I tend to agree.
next prev parent reply other threads:[~2013-08-21 23:20 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-14 18:17 duplicate objects in packfile Jeff King
2013-08-14 18:29 ` Junio C Hamano
2013-08-14 18:39 ` Junio C Hamano
2013-08-14 18:54 ` Nicolas Pitre
2013-08-16 15:01 ` Jeff King
2013-08-21 20:49 ` [RFC/PATCH 0/4] duplicate objects in packfiles Jeff King
2013-08-21 20:51 ` [PATCH 1/4] sha1-lookup: handle duplicate keys with GIT_USE_LOOKUP Jeff King
2013-08-21 20:52 ` [PATCH 2/4] index-pack: optionally reject packs with duplicate objects Jeff King
2013-08-22 13:45 ` Duy Nguyen
2013-08-22 14:43 ` Jeff King
2013-08-22 23:12 ` [PATCHv2 0/6] duplicate objects and delta cycles, oh my! Jeff King
2013-08-22 23:12 ` [PATCH 1/6] test-sha1: add a binary output mode Jeff King
2013-08-22 23:14 ` [PATCH 2/6] sha1-lookup: handle duplicate keys with GIT_USE_LOOKUP Jeff King
2013-08-23 16:41 ` Junio C Hamano
2013-08-23 18:24 ` Jeff King
2013-08-23 18:54 ` Nicolas Pitre
2013-08-23 18:56 ` Jeff King
2013-08-24 0:01 ` [PATCHv3 0/6] duplicate objects and delta cycles Jeff King
2013-08-24 0:01 ` [PATCH 1/6] test-sha1: add a binary output mode Jeff King
2013-08-24 0:02 ` [PATCH 2/6] sha1-lookup: handle duplicate keys with GIT_USE_LOOKUP Jeff King
2013-08-24 0:02 ` [PATCH 3/6] add tests for indexing packs with delta cycles Jeff King
2013-08-24 0:02 ` [PATCH 4/6] test index-pack on packs with recoverable " Jeff King
2013-08-24 0:02 ` [PATCH 5/6] index-pack: optionally reject packs with duplicate objects Jeff King
2013-08-24 0:02 ` [PATCH 6/6] default pack.indexDuplicates to false Jeff King
2013-08-23 19:41 ` [PATCH 2/6] sha1-lookup: handle duplicate keys with GIT_USE_LOOKUP Johannes Sixt
2013-08-23 23:37 ` Jeff King
2013-08-22 23:14 ` [PATCH 3/6] add tests for indexing packs with delta cycles Jeff King
2013-08-22 23:15 ` [PATCH 4/6] test index-pack on packs with recoverable " Jeff King
2013-08-22 23:15 ` [PATCH 5/6] index-pack: optionally reject packs with duplicate objects Jeff King
2013-08-22 23:16 ` [PATCH 6/6] default pack.indexDuplicates to false Jeff King
2013-08-22 23:35 ` [PATCHv2 0/6] duplicate objects and delta cycles, oh my! Nicolas Pitre
2013-08-21 20:53 ` [PATCH 3/4] reject duplicates when indexing a packfile we created Jeff King
2013-08-21 20:55 ` [DO NOT APPLY PATCH 4/4] index-pack: optionally skip duplicate packfile entries Jeff King
2013-08-21 23:20 ` Junio C Hamano [this message]
2013-08-22 0:47 ` Jeff King
2013-08-21 22:17 ` [RFC/PATCH 0/4] duplicate objects in packfiles Junio C Hamano
2013-08-14 18:50 ` duplicate objects in packfile Nicolas Pitre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqob8qskbc.fsf@gitster.dls.corp.google.com \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=nico@fluxnic.net \
--cc=peff@peff.net \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.