From: Junio C Hamano <junkio@cox.net>
To: Petr Baudis <pasky@suse.cz>
Cc: Linus Torvalds <torvalds@osdl.org>, git@vger.kernel.org
Subject: Re: [ANNOUNCE] Cogito-0.12
Date: Thu, 07 Jul 2005 10:21:39 -0700 [thread overview]
Message-ID: <7vk6k2sfa4.fsf@assigned-by-dhcp.cox.net> (raw)
In-Reply-To: 20050707144501.GG19781@pasky.ji.cz
>>>>> "PB" == Petr Baudis <pasky@suse.cz> writes:
PB> It won't happen. Or rather, I hope the HTTP pulls become more efficient
PB> soon. Actually, perhaps Linus has something done already, my workstation
PB> is a bit derailed now so I couldn't pull from him in the last few days
PB> (hopefully will sort that out today).
PB> Hmm, yes, I guess Linus won't be touching the HTTP backend at all. ;-) I
PB> suggest you to check the last development in Linus' branch and sync with
PB> Daniel Barkalow, who promised improving the pull tools as well.
If this weekend is not too late, I have been brewing what is
called an "efficient pull from dumb servers" suite, which would
hopefully fill this gap. I am still in the process of finishing
the details, but basically it already seems to work.
Linus, please drop the patch I sent you earlier, privately by
mistake not CCing the list, that implemented only the server
end. I've changed some file formats already from that one.
The outline of how it works is like this.
* I assume a dumb transport (read: static files only HTTP
server) and no on-request server side processing. All the
smarts must go in the client. The server side X.git being an
ordinary GIT archive (no need for files in the work tree),
plus:
- X.git/objects/pack can have packed GIT archives. I
envision that this will be a series of 5 to 20 MB packs,
occasionally adding a new incremental pack when
X.git/objects/??/ directories accumulate enough standalone
SHA1 files. It is not necessary to have X.git/objects/??/
files if an object is contained in one of the packs.
- X.git/info/ has three extra files.
- "inventory" lists all the branches stored in X.git/refs
and looks like this (contents and path):
ff83c8f3554ceb444b413beaeb49b4a781dae944 snap/0
013e7c7ff498aae82d799f80da37fbd395545456 snap/10
ff83c8f3554ceb444b413beaeb49b4a781dae944 heads/master
dd7ba8b4949535c24e604a37709db0e3be9ccbbc heads/linus
This is to facilitate discovery from a transport that is
not so "ls" friendly, like HTTP.
- "pack" lists available packs under X.git/objects/pack and
looks like this (size and name):
432495 pk-65fe69e9bc2e8a3e0881e008dde182522156ba7c.pack
The file is there for discovery. The size is used by the
client to discover optimum set of packs to slurp.
- "rev-cache" is a binary file that describes commit
ancestry information in a dense format. It lists all
commits available from this repository along with who
its parents are for each of the commit. This file is
produced append-only, so that the server side can use
rsync based mirroring scheme.
A new command "git-update-dumb-server" is used to prepare
these three files. There may need a helper script that uses
git-pack-objects and friends to prepare packs partitioned to
allow pulling a popular branch efficiently.
* The client side is called "git-dumb-pull-script". This
downloads the above three files, and .idx files associated
with packs described in "pack". With the information in
"inventory" about desired branch to pull from along with
"rev-cache" ancestry information, it discovers the set of
commits that is lacking from its local store. By comparing
that list with downloaded .idx files, along with size
information for each pack, it comes up a list of packs to
download to cover the most commits that it wants to obtain,
and downloads them, verifies them and stores them in its
.git/objects/pack/ directory.
The above process of downloading packs would typically not
cover all the things lacking, because some new commits may
not be in any of the packs. After this point, the usual
commit-walking git-http-pull can be used to fill the rest,
and it does not have to pull that many objects. Dan's
http-pull parallelism improvement would be very useful
independently here.
next prev parent reply other threads:[~2005-07-07 17:24 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-07-03 23:46 [ANNOUNCE] Cogito-0.12 Petr Baudis
2005-07-06 12:01 ` Brian Gerst
2005-07-07 14:45 ` Petr Baudis
2005-07-07 17:21 ` Junio C Hamano [this message]
2005-07-07 19:04 ` Linus Torvalds
2005-07-07 19:57 ` Junio C Hamano
2005-07-07 21:58 ` Linus Torvalds
2005-07-07 22:10 ` Junio C Hamano
2005-07-07 20:00 ` Junio C Hamano
2005-07-07 21:29 ` Eric W. Biederman
2005-07-07 22:23 ` Linus Torvalds
2005-07-08 2:11 ` Eric W. Biederman
2005-07-08 1:54 ` Dumb servers (was: [ANNOUNCE] Cogito-0.12) Kevin Smith
2005-07-08 2:27 ` Linus Torvalds
2005-07-07 22:14 ` [ANNOUNCE] Cogito-0.12 Petr Baudis
2005-07-07 22:52 ` Linus Torvalds
2005-07-07 23:16 ` [PATCH] Pull efficiently from a dumb git store Junio C Hamano
2005-07-07 23:50 ` [PATCH] rev-list: add "--objects=self-sufficient" flag Junio C Hamano
2005-07-07 23:58 ` Linus Torvalds
2005-07-08 1:02 ` [PATCH] rev-list: add "--full-objects" flag Junio C Hamano
2005-07-08 1:33 ` Linus Torvalds
2005-07-08 1:46 ` Linus Torvalds
2005-07-08 2:17 ` Junio C Hamano
2005-07-08 2:39 ` Linus Torvalds
2005-07-09 21:09 ` Eric W. Biederman
2005-07-10 5:11 ` Linus Torvalds
2005-07-10 6:28 ` Junio C Hamano
2005-07-10 21:48 ` Sven Verdoolaege
2005-07-10 22:36 ` Linus Torvalds
2005-07-11 15:19 ` Eric W. Biederman
2005-07-11 16:38 ` Linus Torvalds
2005-07-12 0:44 ` Eric W. Biederman
2005-07-12 1:14 ` Linus Torvalds
2005-07-12 2:38 ` Eric W. Biederman
2005-07-12 3:21 ` Linus Torvalds
2005-07-12 3:39 ` Eric W. Biederman
2005-07-12 4:48 ` Linus Torvalds
2005-07-11 17:53 ` Linus Torvalds
[not found] ` <7vy88gzn6s.fsf@assigned-by-dhcp.cox.net>
[not found] ` <Pine.LNX.4.58.0507082109140.17536@g5.osdl.org>
[not found] ` <7vfyumj8hn.fsf_-_@assigned-by-dhcp.cox.net>
2005-07-11 7:00 ` [PATCH] Check packs and then files Junio C Hamano
2005-07-08 1:03 ` [PATCH] Give --full-objects flag to rev-list when preparing a dumb server Junio C Hamano
2005-07-07 23:50 ` [PATCH] Use --objects=self-sufficient flag to rev-list Junio C Hamano
2005-07-07 23:52 ` [ANNOUNCE] Cogito-0.12 Tony Luck
2005-07-07 23:54 ` Junio C Hamano
2005-07-07 23:59 ` Linus Torvalds
2005-07-08 0:09 ` Tony Luck
2005-07-08 0:23 ` Linus Torvalds
2005-07-09 21:58 ` Russell King
2005-07-09 22:29 ` Russell King
2005-07-09 23:46 ` Junio C Hamano
2005-07-10 5:02 ` Linus Torvalds
2005-07-10 5:15 ` Linus Torvalds
2005-07-10 6:55 ` Russell King
2005-07-10 7:15 ` Junio C Hamano
2005-07-10 12:46 ` Russell King
2005-07-10 16:51 ` Linus Torvalds
2005-07-10 19:15 ` Russell King
2005-07-10 20:03 ` Linus Torvalds
2005-07-10 20:32 ` Russell King
2005-07-10 21:40 ` Linus Torvalds
2005-07-10 8:09 ` Russell King
2005-07-10 14:59 ` Petr Baudis
2005-07-11 20:30 ` Chris Wright
2005-07-08 0:09 ` Linus Torvalds
2005-07-08 8:14 ` Petr Baudis
2005-07-08 15:56 ` Daniel Barkalow
2005-07-07 6:22 ` Chris Wright
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7vk6k2sfa4.fsf@assigned-by-dhcp.cox.net \
--to=junkio@cox.net \
--cc=git@vger.kernel.org \
--cc=pasky@suse.cz \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).