git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* small question about the repack algorithm
@ 2008-02-07  9:03 Pierre Habouzit
  2008-02-08  7:12 ` Junio C Hamano
  0 siblings, 1 reply; 3+ messages in thread
From: Pierre Habouzit @ 2008-02-07  9:03 UTC (permalink / raw)
  To: Git ML

[-- Attachment #1: Type: text/plain, Size: 2170 bytes --]


  I've trying to see if that optimization was used but I was somehow
unable to find if it was the case, as the code is a bit tough :)

  I was wondering if the repacking window was using triangle inequality
to discard trying some costly deltas (I assume that what costs the most
in the repacking is computing the delta). I mean, if you consider the
"size" of a delta, I'm almost sure that it's very near a distance.

  So assuming that we know the delta sizes between any pair of reference
objects in the window, well, if an object we want to delta against the
window Od are near one reference O1 enough, for each Oi in the window
that holds: len(δ(O1, Oi)) > 2 * len(δ(Od, O1)), then it's not worth
investigating.

  I've seen quite many heuristics tried to avoid computing a delta at
the beginning of builtin-pack-objects.c:try_delta, but I don't seem to
find this one used, whereas it would even encompass most of the
heuristics here. I mean the object-size based heuristics are (if I'm
correct) a special case of the previous, e.g. I think one of the
heuristics is to not trying a delta if:
  | len(Oi) - len(Od) | > 2 * len(δ(Od, O1))

But it's just a subcase of the former because
  len(δ(Od, Oi)) >= | len(Oi) - len(Od) |


  Of course it needs us to know how objects in the same window diff
against one each other, of have a good estimation of it (having not too
bad lower estimates of the real deltas is an option here, as it just
makes the test a bit less discriminant, but can allow us to have a
trade-off between computing each crossed deltas in the window and
dropping tried deltas early). Note that we don't need to store the
actual deltas, just their length (or a more complicated score if we want
-- probably -- take the chain depth into account, we just need that
score to look like a distance enough).


  If it's already done, then don't mind me, though I'd like to be
pointed to where so that I can grok that code a bit more :)
-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-02-08  8:39 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-07  9:03 small question about the repack algorithm Pierre Habouzit
2008-02-08  7:12 ` Junio C Hamano
2008-02-08  8:38   ` Pierre Habouzit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).