From: Nicolas Pitre <nico@cam.org>
To: Scott Chacon <schacon@gmail.com>
Cc: git list <git@vger.kernel.org>
Subject: Re: upload-pack packfile caching
Date: Tue, 16 Sep 2008 16:59:31 -0400 (EDT) [thread overview]
Message-ID: <alpine.LFD.1.10.0809161549240.6279@xanadu.home> (raw)
In-Reply-To: <d411cc4a0809161052h43b1be7dh9b322c8ec7d49add@mail.gmail.com>
On Tue, 16 Sep 2008, Scott Chacon wrote:
> I was wondering if it would be of general interest to have upload-pack
> take an option to cache packfiles. Unless I am mistaken, every clone
> on a git server will recreate the same packfile until something new is
> pushed into it, correct? I thought it might be a good idea to pass an
> option to have it cache the packfile that is created if
> create_full_pack is set and re-use it until the repository is updated.
> If I patched upload-pack to do this, would there be any interest in
> it?
Well, if you do that there are a few things to be careful about.
First, having a server process able to write files is a security hazard.
If you want to create a pack cache then it is best if created manually
by the repository owner. You don't want someone cloning a repository
actually messing with such cache.
Secondly, the dynamic creation of a pack currently take into account the
capabilities of the client so not to produce a pack with features that
the client does not support. So in order not to have to cache pack with
many feature combinations, this cache should probably only take effect
if pack capabilities of the server are also supported by the client.
Now, the _only_ advantage of a cached pack file is in avoiding execution
of rev-list. Otherwise creation of a pack for streaming is almost
identical to straight copying of data from disk due to pack data reuse.
The rev-list can be made faster by having the pack-objects process do
the object listing itself instead of piping the output from rev-list
into it ('git repack' does that but 'git-upload-pack' doesn't). And I
believe that rev-list could be made much much faster with pack v4.
That been said...
What you could have is a simple file with 2 SHA1s: the first
corresponding to the output of 'git for-each-ref' and the second one
corresponding to the list of all objects reachable from those refs.
For example:
1) git for-each-ref --format="%(objectname)" --sort=objectname | sha1sum
2) git for-each-ref --format="%(objectname)" | \
xargs git rev-list --objects | cut -c -40 | sort | sha1sum
So, if you do the above in a freshly cloned repository, you'll find that
the SHA1 in 2) corresponds to this:
3) git show-index < .git/objects/pack/pack-*.idx | cut -f2 -d' ' | sha1sum
which means that all objects reachable from all refs are found in the
only pack you have.
Now, if the SHA1 in 2) is computed over the binary representation of all
those object names, you'll find out that it corresponds to the actual
pack name in the .git/objects/pack/ directory.
So what upload-pack could do is look for a special file with those 2
SHA1s, and if it exists then see if the first SHA1 matches the list of
values for all refs, if so then the name of the pack to send out
corresponds to the second SHA1. If that pack is found in the repository
then you just have to stream it out.
Creating that file is then just a matter of doing the equivalent of the
above commands and repacking your repository
into a single pack.
Nicolas
prev parent reply other threads:[~2008-09-16 21:00 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-09-16 17:52 upload-pack packfile caching Scott Chacon
2008-09-16 20:59 ` Nicolas Pitre [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.1.10.0809161549240.6279@xanadu.home \
--to=nico@cam.org \
--cc=git@vger.kernel.org \
--cc=schacon@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).