From: Nicolas Pitre <nico@cam.org>
To: Junio C Hamano <junkio@cox.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH/RFC] reverse the pack-objects delta window logic
Date: Wed, 26 Apr 2006 11:48:53 -0400 (EDT) [thread overview]
Message-ID: <Pine.LNX.4.64.0604261108430.18520@localhost.localdomain> (raw)
In-Reply-To: <7vpsj4sxer.fsf@assigned-by-dhcp.cox.net>
On Tue, 25 Apr 2006, Junio C Hamano wrote:
> Nicolas Pitre <nico@cam.org> writes:
>
> > Note, this is a RFC particularly to Junio since the resulting pack is
> > larger than without the patch with git-repack -a -f. However using a
> > subsequent git-repack -a brings the pack size down to expected size. So
> > I'm not sure I've got everything right.
>
> I haven't tested it seriously yet, but there is nothing that
> looks obviously wrong that might cause the inflation problem,
> from the cursory look after applying the patch on top of your
> last round.
>
> > + if (nr_objects == nr_result && trg_entry->delta_limit >= max_depth)
> > + return 0;
>
> The older code was loosening this check only for a delta chain
> that is already in pack (which is limited to its previous
> max_depth). The end result is almost the same -- a thin pack
> recipient would have deeper delta than it asked. The difference
> is that the earlier code had implicit 2*max_depth limit,
Ah. Indeed. Didn't realize that. I can restore that behavior quite
easily if necessary.
> but this one makes the chain length unbounded, which I do not think it
> is necessarily a bad change.
Well as long as the thin pack doesn't carry too many revisions it should
be fine since, as the comment in the code sais, those packs are always
unpacked.
Initially I had a bug where the delta depth was completely ignored. I
was pretty excited when repacking the kernel produced a pack 20% smaller
although I didn't know why at that time. But when attempting another
git-repack -a -f then the initial object counting was sooooooo
slooooooooow.
> > - /*
> > - * NOTE!
> > - *
> > - * We always delta from the bigger to the smaller, since that's
> > - * more space-efficient (deletes don't have to say _what_ they
> > - * delete).
> > - */
>
> This comment by Linus still applies, even though the scan order
> is now reversed; no need to remove it.
This is not exactly true. In general it is so, but as we fixed the
deltification of objects with the same name but in different directories
it is well possible to go from smaller to larger and leaving that
comment there is misleading.
This is also why I changed the sizediff rule such that:
sizediff = src_size < size ? size - src_size : 0;
Since the src buffer already has its delta index computed, it costs
almost nothing to attempt matching much smaller objects against it.
However if we go from small to larger then the previous logic still
applies.
> > + if (trg_entry->delta) {
> > + /*
> > + * The target object already has a delta base but we just
> > + * found a better one. Remove it from its former base
> > + * childhood and redetermine the base delta_limit (if used).
> > + */
>
> And you are making the delta chain unbound for thin case, you
> can probably omit this with the same if() here; the
> recomputation seems rather expensive.
Ah right. I was doing so partly, but I can skip any tree maintenance
altogether in that case as well.
> > + if (!size)
> > + continue;
> > + delta_index = create_delta_index(n->data, size);
> > + if (!delta_index)
> > + die("out of memory");
>
> It might be worth saying "if (size < 50)" here as well; no point
> wasting the delta window for small sources.
Good point. No real effect on the pack size though.
> > -#if 0
> > - /* if we made n a delta, and if n is already at max
> > - * depth, leaving it in the window is pointless. we
> > - * should evict it first.
> > - * ... in theory only; somehow this makes things worse.
> > - */
> > - if (entry->delta && depth <= entry->depth)
> > - continue;
> > -#endif
>
> I was almost tempted to suggest that the degradation you are
> seeing might be related to this mystery I did not get around to
> solve. By allowing to give chance to try delta against less
> optimum candidates, it appeared that we ended up making the
> final pack size bigger than otherwise, which suggests that our
> choice between plain undeltified and a delta half its size might
> be favoring delta too much. But it does not appear to be
> related to the inflation you are seeing.
Certainly not, since git-repack -a may only delta _more_ and the pack
size actualy goes down a lot in my case.
The mystery I'm facing is why would a second pass with git-repack -a fix
things? It has a different window behavior since objects already
deltified do not occupy window space. Hmmm. That would certainly
explain why doing a git-repack -a after a git-repack -a -f produces a
smaller pack even currently.
> BTW, have you tried it without --no-reuse-pack on an object list
> that is not thin? It appears you are busting the depth limit.
>
> Using the same "git rev-list --objects v1.2.3..v1.3.0" as input,
> git-pack-objects without --no-reuse-pack gives this
> distribution:
>
> chain length = 1: 364 objects
> chain length = 2: 269 objects
> chain length = 3: 198 objects
> chain length = 4: 164 objects
> chain length = 5: 148 objects
> chain length = 6: 123 objects
> chain length = 7: 122 objects
> chain length = 8: 103 objects
> chain length = 9: 92 objects
> chain length = 10: 234 objects
> chain length = 11: 12 objects
> chain length = 12: 1 object
> chain length = 13: 2 objects
Oops. OK fixed.
> So it _might_ be that the depth limiting code is subtly broken
> which is causing you throw away a perfectly good delta base
> which in turn results in a bad pack.
Actually no. That bug instead allowed each given base to deltify more
targets than it should have.
Nicolas
next prev parent reply other threads:[~2006-04-26 15:48 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-04-26 3:37 [PATCH/RFC] reverse the pack-objects delta window logic Nicolas Pitre
2006-04-26 5:45 ` Junio C Hamano
2006-04-26 15:48 ` Nicolas Pitre [this message]
2006-04-26 17:48 ` Nicolas Pitre
2006-04-27 3:05 ` Nicolas Pitre
2006-04-27 3:58 ` [PATCH] use delta index data when finding best delta matches Nicolas Pitre
2006-04-28 1:08 ` Junio C Hamano
2006-04-28 1:56 ` Nicolas Pitre
2006-04-27 5:04 ` [PATCH/RFC] reverse the pack-objects delta window logic Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0604261108430.18520@localhost.localdomain \
--to=nico@cam.org \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).