From: Junio C Hamano <junkio@cox.net>
To: Dan Holmsand <holmsand@gmail.com>
Cc: Daniel Barkalow <barkalow@iabervon.org>,
torvalds@osdl.org, git@vger.kernel.org
Subject: Re: [RFC] Design for http-pull on repo with packs
Date: Mon, 11 Jul 2005 16:30:48 -0700 [thread overview]
Message-ID: <7vu0j0ncnr.fsf@assigned-by-dhcp.cox.net> (raw)
In-Reply-To: 42D2960E.3050008@gmail.com
Dan Holmsand <holmsand@gmail.com> writes:
> I did a little experiment. I cloned Linus' current tree, and git
> repacked everything (that's 63M + 3.3M worth of pack files). Then I
> got something like 25 or so of Jeff's branches. That's 6.9M of object
> files, and 1.4M packed. Total size: 70M for the entire
> .git/objects/pack directory.
>
> Repacking all of that to a single pack file gives, somewhat
> surprisingly, a pack size of 62M (+ 1.3M index). In other words, the
> cost of getting all those branches, and all of the new stuff from
> Linus, turns out to be *negative* (probably due to some strange
> deltification coincidence).
We do _not_ want to optimize for initial slurps into empty
repositories. Quite the opposite. We want to optimize for
allowing quick updates of reasonably up-to-date developer repos.
If initial slurps are _also_ efficient then that is an added
bonus; that is something the baseline big pack (60M Linus pack)
would give us already. So repacking everything into a single
pack nightly is _not_ what we want to do, even though that would
give the maximum compression ;-). I know you understand this,
but just stating the second of the above paragraphs would give
casual readers a wrong impression.
> I think that this shows that (at least in this case), having many
> branches isn't particularly wasteful (1.4M in this case with one
> incremental pack).
> And that fewer packs beats many packs quite handily.
You are correct. For somebody like Jeff, having the Linus
baseline pack with one pack of all of his head (incremental that
excludes what is already in the Linus baseline pack) would help
pullers.
> The big problem, however, comes when Jeff (or anyone else) decides to
> repack. Then, if you fetch both his repo and Linus', you might end up
> with several really big pack files, that mostly overlap. That could
> easily mean storing most objects many times, if you don't do some
> smart selective un/repacking when fetching.
Indeed. Overlapping packs is a possibility, but my gut feeling
is that it would not be too bad, if things are arranged so that
packs are expanded-and-then-repacked _very_ rarely if ever.
Instead, at least for your public repository, if you only repack
incrementally I think you would be OK.
next prev parent reply other threads:[~2005-07-11 23:37 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-07-10 18:42 [RFC] Design for http-pull on repo with packs Daniel Barkalow
2005-07-10 19:56 ` Dan Holmsand
2005-07-10 20:29 ` Daniel Barkalow
2005-07-10 21:39 ` Dan Holmsand
2005-07-11 3:18 ` Junio C Hamano
2005-07-11 15:53 ` Dan Holmsand
2005-07-11 17:08 ` Tony Luck
2005-07-11 23:30 ` Junio C Hamano [this message]
2005-07-12 17:21 ` Dan Holmsand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7vu0j0ncnr.fsf@assigned-by-dhcp.cox.net \
--to=junkio@cox.net \
--cc=barkalow@iabervon.org \
--cc=git@vger.kernel.org \
--cc=holmsand@gmail.com \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).