* upload-pack packfile caching
@ 2008-09-16 17:52 Scott Chacon
2008-09-16 20:59 ` Nicolas Pitre
0 siblings, 1 reply; 2+ messages in thread
From: Scott Chacon @ 2008-09-16 17:52 UTC (permalink / raw)
To: git list
I was wondering if it would be of general interest to have upload-pack
take an option to cache packfiles. Unless I am mistaken, every clone
on a git server will recreate the same packfile until something new is
pushed into it, correct? I thought it might be a good idea to pass an
option to have it cache the packfile that is created if
create_full_pack is set and re-use it until the repository is updated.
If I patched upload-pack to do this, would there be any interest in
it?
Scott
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: upload-pack packfile caching
2008-09-16 17:52 upload-pack packfile caching Scott Chacon
@ 2008-09-16 20:59 ` Nicolas Pitre
0 siblings, 0 replies; 2+ messages in thread
From: Nicolas Pitre @ 2008-09-16 20:59 UTC (permalink / raw)
To: Scott Chacon; +Cc: git list
On Tue, 16 Sep 2008, Scott Chacon wrote:
> I was wondering if it would be of general interest to have upload-pack
> take an option to cache packfiles. Unless I am mistaken, every clone
> on a git server will recreate the same packfile until something new is
> pushed into it, correct? I thought it might be a good idea to pass an
> option to have it cache the packfile that is created if
> create_full_pack is set and re-use it until the repository is updated.
> If I patched upload-pack to do this, would there be any interest in
> it?
Well, if you do that there are a few things to be careful about.
First, having a server process able to write files is a security hazard.
If you want to create a pack cache then it is best if created manually
by the repository owner. You don't want someone cloning a repository
actually messing with such cache.
Secondly, the dynamic creation of a pack currently take into account the
capabilities of the client so not to produce a pack with features that
the client does not support. So in order not to have to cache pack with
many feature combinations, this cache should probably only take effect
if pack capabilities of the server are also supported by the client.
Now, the _only_ advantage of a cached pack file is in avoiding execution
of rev-list. Otherwise creation of a pack for streaming is almost
identical to straight copying of data from disk due to pack data reuse.
The rev-list can be made faster by having the pack-objects process do
the object listing itself instead of piping the output from rev-list
into it ('git repack' does that but 'git-upload-pack' doesn't). And I
believe that rev-list could be made much much faster with pack v4.
That been said...
What you could have is a simple file with 2 SHA1s: the first
corresponding to the output of 'git for-each-ref' and the second one
corresponding to the list of all objects reachable from those refs.
For example:
1) git for-each-ref --format="%(objectname)" --sort=objectname | sha1sum
2) git for-each-ref --format="%(objectname)" | \
xargs git rev-list --objects | cut -c -40 | sort | sha1sum
So, if you do the above in a freshly cloned repository, you'll find that
the SHA1 in 2) corresponds to this:
3) git show-index < .git/objects/pack/pack-*.idx | cut -f2 -d' ' | sha1sum
which means that all objects reachable from all refs are found in the
only pack you have.
Now, if the SHA1 in 2) is computed over the binary representation of all
those object names, you'll find out that it corresponds to the actual
pack name in the .git/objects/pack/ directory.
So what upload-pack could do is look for a special file with those 2
SHA1s, and if it exists then see if the first SHA1 matches the list of
values for all refs, if so then the name of the pack to send out
corresponds to the second SHA1. If that pack is found in the repository
then you just have to stream it out.
Creating that file is then just a matter of doing the equivalent of the
above commands and repacking your repository
into a single pack.
Nicolas
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2008-09-16 21:00 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-16 17:52 upload-pack packfile caching Scott Chacon
2008-09-16 20:59 ` Nicolas Pitre
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).