From: A Large Angry SCM <gitzilla@gmail.com>
To: Nicolas Pitre <nico@fluxnic.net>
Cc: Junio C Hamano <gitster@pobox.com>,
Shawn Pearce <spearce@spearce.org>,
Johannes Sixt <j.sixt@viscovery.net>,
git@vger.kernel.org, John Hawley <warthog19@eaglescrag.net>
Subject: Re: [RFC] Add --create-cache to repack
Date: Sun, 30 Jan 2011 12:41:14 -0500 [thread overview]
Message-ID: <4D45A2BA.8040908@gmail.com> (raw)
In-Reply-To: <alpine.LFD.2.00.1101301208270.8580@xanadu.home>
On 01/30/2011 12:14 PM, Nicolas Pitre wrote:
> On Sat, 29 Jan 2011, Junio C Hamano wrote:
>
>> Shawn Pearce<spearce@spearce.org> writes:
>>
>>> I fully implemented the reuse of a cached pack behind a thin pack idea
>>> I was trying to describe in this thread. It saved 1m7s off the JGit
>>> running time, but increased the data transfer by 25 MiB. I didn't
>>> expect this much of an increase, I honestly expected the thin pack
>>> portion to be well, thinner. The issue is the thin pack cannot delta
>>> against all of the history, its only delta compressing against the tip
>>> of the cached pack. So long-lived side branches that forked off an
>>> older part of the history aren't delta compressing well, or at all,
>>> and that is significantly bloating the thin pack. (Its also why that
>>> "newer" pack is 57M, but should be 14M if correctly combined with the
>>> cached pack.) If I were to consider all of the objects in the cached
>>> pack as potential delta base candidates for the thin pack, the entire
>>> benefit of the cached pack disappears.
>>
>> What if you instead use the cached pack this way?
>>
>> 0. You perform the proposed pre-traversal until you hit the tip of cached
>> pack(s), and realize that you will end up sending everything.
>>
>> 1. Instead of sending the new part of the history first and then sending
>> the cached pack(s), you send the contents of cached pack(s), but also
>> note what objects you sent;
>>
>> 2. Then you send the new part of the history, taking full advantage of
>> what you have already sent, perhaps doing only half of the reuse-delta
>> logic (i.e. you reuse what you can reuse, but you do _not_ punt on an
>> object that is not a delta in an existing pack).
>
> The problem is to determine the best base object to delta against. If
> you end up listing all the already sent objects and perform delta
> attempts against them for the remaining non delta objects to find the
> best match then you might end up taking more CPU time than the current
> enumeration phase.
Why worry about best here? Just add the object (or one of the objects)
with the same path from the commit you found in step 0, above, to the
delta base search for each object to pack.
next prev parent reply other threads:[~2011-01-30 17:41 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-28 8:06 [RFC] Add --create-cache to repack Shawn O. Pearce
2011-01-28 9:08 ` Johannes Sixt
2011-01-28 14:37 ` Shawn Pearce
2011-01-28 15:33 ` Johannes Sixt
2011-01-28 18:22 ` Shawn Pearce
2011-01-28 19:15 ` Jay Soffian
2011-01-28 19:19 ` Shawn Pearce
2011-01-28 18:46 ` Nicolas Pitre
2011-01-28 19:15 ` Shawn Pearce
2011-01-28 21:09 ` Nicolas Pitre
2011-01-29 1:32 ` Shawn Pearce
2011-01-29 2:34 ` Shawn Pearce
2011-01-30 8:05 ` Junio C Hamano
2011-01-30 19:43 ` Shawn Pearce
2011-01-30 20:02 ` Junio C Hamano
2011-01-30 20:20 ` Shawn Pearce
2011-01-30 22:26 ` Nicolas Pitre
2011-01-29 4:08 ` Nicolas Pitre
2011-01-29 4:35 ` Shawn Pearce
2011-01-30 6:51 ` Junio C Hamano
2011-01-30 17:14 ` Nicolas Pitre
2011-01-30 17:41 ` A Large Angry SCM [this message]
2011-01-30 19:29 ` Shawn Pearce
2011-01-30 22:13 ` Shawn Pearce
2011-01-31 18:47 ` Shawn Pearce
2011-01-31 21:48 ` Nicolas Pitre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D45A2BA.8040908@gmail.com \
--to=gitzilla@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=j.sixt@viscovery.net \
--cc=nico@fluxnic.net \
--cc=spearce@spearce.org \
--cc=warthog19@eaglescrag.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).