From: Junio C Hamano <junkio@cox.net>
To: Frank Sorenson <frank@tuxrocks.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH] git-repack-script: Add option to repack all objects
Date: Mon, 29 Aug 2005 09:48:23 -0700 [thread overview]
Message-ID: <7vvf1obsfc.fsf@assigned-by-dhcp.cox.net> (raw)
In-Reply-To: 4312BC27.9010604@tuxrocks.com
Frank Sorenson <frank@tuxrocks.com> writes:
> It reduces the disk space requirement significantly (linux packs from
> 135MB to 73MB), and I'm seeing speed improvements as well (probably
> because cache-cold operation requires far less seeking, and the caching
> requirements are smaller).
>
> What are the benefits to keeping old packs?
For a private repository where one does development in and does
push to public repositories from, packing everything into one
pack and pruning everything else (including old packs) is always
the optimum thing, if one can afford the time to repack.
There are no benefits to _keeping_ old packs, but there may be
benefits not to pack everything into one huge one when other
people are involved.
Suppose I have currently three packs (one since the beginning of
time to some time ago, one incremental on top of it, another
incremental on top of the other two). Somebody cloned from my
repository reasonably early in the project timeline (he has only
the first pack), somebody else cloned yesterday (has all three
packs). And "git count-objects" reports many other objects are
unpacked and I decide it is a time to repack.
At this point I could create everything into one new big pack
and remove old packs. Or I could create the fourth incremental.
Another possibility, and which is what I currently do by hand,
is to create a pack that is incremental on top of the first two,
and replace the latest incremental with it.
Now these two people want to fetch from my repository while the
third person wants to clone from scratch. Which repacking
strategy gives the best transfer to these three people? Having
a single huge pack favors the newcomer and penalizes the old
timers. Especially, the current http-pull does not have a smart
to pick a better pack when an object is found in more than one
packs, so leaving old packs around would not help.
Leaving the old packs around could help all of them. In the
above example, I could create the fourth incremental _and_ a
superpack that has everything in it. The newcomer would slurp
in the superpack, the one with only the first pack can use one
of the second+third+fourth or the superpack, and the one with
all three can use the fourth pack.
Having said that, the packing has an interesting compression
characteristics. Repacking the three existing packs (from the
example) along with the unpacked objects into one pack would
result in a very small pack, compared to the sum of three
existing packs, depending on how often you repack. In that
sense, it may not be such a big deal to force everybody to
re-fetch everything even if most of them are already locally
available, by repacking everything into one.
> I disagree about not removing old packs.
I am not saying we should not remove old pack. I am saying that
repacking, choosing which pack to remove and doing the actual
removing should be kept as separate steps and in separate
commands, perhaps the latter two as part of "git prune".
-jc
next prev parent reply other threads:[~2005-08-29 16:48 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-08-27 8:41 [PATCH] git-repack-script: Add option to repack all objects Frank Sorenson
2005-08-28 21:06 ` Junio C Hamano
2005-08-29 7:41 ` Frank Sorenson
2005-08-29 16:48 ` Junio C Hamano [this message]
2005-08-29 17:34 ` Junio C Hamano
2005-08-29 18:29 ` A Large Angry SCM
2005-08-29 18:44 ` Junio C Hamano
2005-08-29 18:57 ` A Large Angry SCM
2005-08-29 19:44 ` Junio C Hamano
2005-08-29 18:59 ` Frank Sorenson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7vvf1obsfc.fsf@assigned-by-dhcp.cox.net \
--to=junkio@cox.net \
--cc=frank@tuxrocks.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).