* [ANNOUNCE] Cogito-0.12 @ 2005-07-03 23:46 Petr Baudis 2005-07-06 12:01 ` Brian Gerst 2005-07-07 6:22 ` Chris Wright 0 siblings, 2 replies; 66+ messages in thread From: Petr Baudis @ 2005-07-03 23:46 UTC (permalink / raw) To: git Hello, I'm happy to announce the release of the 0.12 version of the Cogito SCM-like layer over Linus' GIT tree history storage tool. Get it at http://www.kernel.org/pub/software/scm/cogito/ or cg-update if you have an older version cloned. I wanted to release it later with more cool features, but after all releasing often is good and people will get to test things more, and I wanted to make it possible for kernel.org to upgrade to newer RPM. But it may not be as stable as I'd wish and may have some rough edges, so be warned. This release contains the latest stuff from Linus, with all the packing stuff and everything. Other things include heaps of bugfixes, enhanced options parsing, ~/.cgrc support, cg-push, real cg-tag, and plenty of smaller but nice stuff. And more to come in next days! About cg-push, it: (i) works only locally or over git+ssh branches (ii) the head updated on the other side must be 'master' too (high priority to fix) (iii) the head updated on the other side is re-created, thus losing all attributes (ownership, permissions) (high priority to fix) (iv) won't update the remote working tree if there is any associated with the repository - do cg-cancel to catch up, but that will lose any local changes you did (note that I plan to rename cg-cancel to cg-reset) Also, I've deprecated rsync, as I explained in another mail. Use cg-branch-chg to change the branch URLs to some more sensible scheme - most likely HTTP, or SSH if you want to push as well. Have fun, -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ <Espy> be careful, some twit might quote you out of context.. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-03 23:46 [ANNOUNCE] Cogito-0.12 Petr Baudis @ 2005-07-06 12:01 ` Brian Gerst 2005-07-07 14:45 ` Petr Baudis 2005-07-07 6:22 ` Chris Wright 1 sibling, 1 reply; 66+ messages in thread From: Brian Gerst @ 2005-07-06 12:01 UTC (permalink / raw) To: Petr Baudis; +Cc: git Petr Baudis wrote: > Hello, > > I'm happy to announce the release of the 0.12 version of the Cogito > SCM-like layer over Linus' GIT tree history storage tool. Get it at > > http://www.kernel.org/pub/software/scm/cogito/ > > or cg-update if you have an older version cloned. > > I wanted to release it later with more cool features, but after all > releasing often is good and people will get to test things more, and > I wanted to make it possible for kernel.org to upgrade to newer RPM. > But it may not be as stable as I'd wish and may have some rough edges, > so be warned. > > This release contains the latest stuff from Linus, with all the > packing stuff and everything. Other things include heaps of bugfixes, > enhanced options parsing, ~/.cgrc support, cg-push, real cg-tag, and > plenty of smaller but nice stuff. And more to come in next days! > > About cg-push, it: > > (i) works only locally or over git+ssh branches > > (ii) the head updated on the other side must be 'master' too > (high priority to fix) > > (iii) the head updated on the other side is re-created, thus losing > all attributes (ownership, permissions) > (high priority to fix) > > (iv) won't update the remote working tree if there is any associated > with the repository - do cg-cancel to catch up, but that will > lose any local changes you did (note that I plan to rename > cg-cancel to cg-reset) > > Also, I've deprecated rsync, as I explained in another mail. Use > cg-branch-chg to change the branch URLs to some more sensible scheme - > most likely HTTP, or SSH if you want to push as well. I really question removing rsync before HTTP pulls become more effecient. I did a complete pull of cogito from kernel.org, and http took over 50 minutes to pull everything, while rsync was done in just over 1 minute. I dared not even try to pull the full kernel at that speed. I suspect that part of the problem is that the pull methods are doing a depth first search, so we can't request the next object until the current object is fully received and parsed. Changing to a breadth first search would allow multiple requests in flight and asynchronous processing which should speed things up. I am exploring using the curl_multi_* functions to do this, but this will require changes to common code in pull.c. -- Brian Gerst ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-06 12:01 ` Brian Gerst @ 2005-07-07 14:45 ` Petr Baudis 2005-07-07 17:21 ` Junio C Hamano 0 siblings, 1 reply; 66+ messages in thread From: Petr Baudis @ 2005-07-07 14:45 UTC (permalink / raw) To: Brian Gerst; +Cc: git Dear diary, on Wed, Jul 06, 2005 at 02:01:38PM CEST, I got a letter where Brian Gerst <bgerst@didntduck.org> told me that... > Petr Baudis wrote: > > Also, I've deprecated rsync, as I explained in another mail. Use > >cg-branch-chg to change the branch URLs to some more sensible scheme - > >most likely HTTP, or SSH if you want to push as well. > > I really question removing rsync before HTTP pulls become more > effecient. It won't happen. Or rather, I hope the HTTP pulls become more efficient soon. Actually, perhaps Linus has something done already, my workstation is a bit derailed now so I couldn't pull from him in the last few days (hopefully will sort that out today). > I did a complete pull of cogito from kernel.org, and http > took over 50 minutes to pull everything, while rsync was done in just > over 1 minute. I dared not even try to pull the full kernel at that speed. > > I suspect that part of the problem is that the pull methods are doing a > depth first search, so we can't request the next object until the > current object is fully received and parsed. Changing to a breadth > first search would allow multiple requests in flight and asynchronous > processing which should speed things up. I am exploring using the > curl_multi_* functions to do this, but this will require changes to > common code in pull.c. Hmm, yes, I guess Linus won't be touching the HTTP backend at all. ;-) I suggest you to check the last development in Linus' branch and sync with Daniel Barkalow, who promised improving the pull tools as well. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ <Espy> be careful, some twit might quote you out of context.. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-07 14:45 ` Petr Baudis @ 2005-07-07 17:21 ` Junio C Hamano 2005-07-07 19:04 ` Linus Torvalds 0 siblings, 1 reply; 66+ messages in thread From: Junio C Hamano @ 2005-07-07 17:21 UTC (permalink / raw) To: Petr Baudis; +Cc: Linus Torvalds, git >>>>> "PB" == Petr Baudis <pasky@suse.cz> writes: PB> It won't happen. Or rather, I hope the HTTP pulls become more efficient PB> soon. Actually, perhaps Linus has something done already, my workstation PB> is a bit derailed now so I couldn't pull from him in the last few days PB> (hopefully will sort that out today). PB> Hmm, yes, I guess Linus won't be touching the HTTP backend at all. ;-) I PB> suggest you to check the last development in Linus' branch and sync with PB> Daniel Barkalow, who promised improving the pull tools as well. If this weekend is not too late, I have been brewing what is called an "efficient pull from dumb servers" suite, which would hopefully fill this gap. I am still in the process of finishing the details, but basically it already seems to work. Linus, please drop the patch I sent you earlier, privately by mistake not CCing the list, that implemented only the server end. I've changed some file formats already from that one. The outline of how it works is like this. * I assume a dumb transport (read: static files only HTTP server) and no on-request server side processing. All the smarts must go in the client. The server side X.git being an ordinary GIT archive (no need for files in the work tree), plus: - X.git/objects/pack can have packed GIT archives. I envision that this will be a series of 5 to 20 MB packs, occasionally adding a new incremental pack when X.git/objects/??/ directories accumulate enough standalone SHA1 files. It is not necessary to have X.git/objects/??/ files if an object is contained in one of the packs. - X.git/info/ has three extra files. - "inventory" lists all the branches stored in X.git/refs and looks like this (contents and path): ff83c8f3554ceb444b413beaeb49b4a781dae944 snap/0 013e7c7ff498aae82d799f80da37fbd395545456 snap/10 ff83c8f3554ceb444b413beaeb49b4a781dae944 heads/master dd7ba8b4949535c24e604a37709db0e3be9ccbbc heads/linus This is to facilitate discovery from a transport that is not so "ls" friendly, like HTTP. - "pack" lists available packs under X.git/objects/pack and looks like this (size and name): 432495 pk-65fe69e9bc2e8a3e0881e008dde182522156ba7c.pack The file is there for discovery. The size is used by the client to discover optimum set of packs to slurp. - "rev-cache" is a binary file that describes commit ancestry information in a dense format. It lists all commits available from this repository along with who its parents are for each of the commit. This file is produced append-only, so that the server side can use rsync based mirroring scheme. A new command "git-update-dumb-server" is used to prepare these three files. There may need a helper script that uses git-pack-objects and friends to prepare packs partitioned to allow pulling a popular branch efficiently. * The client side is called "git-dumb-pull-script". This downloads the above three files, and .idx files associated with packs described in "pack". With the information in "inventory" about desired branch to pull from along with "rev-cache" ancestry information, it discovers the set of commits that is lacking from its local store. By comparing that list with downloaded .idx files, along with size information for each pack, it comes up a list of packs to download to cover the most commits that it wants to obtain, and downloads them, verifies them and stores them in its .git/objects/pack/ directory. The above process of downloading packs would typically not cover all the things lacking, because some new commits may not be in any of the packs. After this point, the usual commit-walking git-http-pull can be used to fill the rest, and it does not have to pull that many objects. Dan's http-pull parallelism improvement would be very useful independently here. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-07 17:21 ` Junio C Hamano @ 2005-07-07 19:04 ` Linus Torvalds 2005-07-07 19:57 ` Junio C Hamano ` (3 more replies) 0 siblings, 4 replies; 66+ messages in thread From: Linus Torvalds @ 2005-07-07 19:04 UTC (permalink / raw) To: Junio C Hamano; +Cc: Petr Baudis, git On Thu, 7 Jul 2005, Junio C Hamano wrote: > > - X.git/objects/pack can have packed GIT archives. I > envision that this will be a series of 5 to 20 MB packs, > occasionally adding a new incremental pack when > X.git/objects/??/ directories accumulate enough standalone > SHA1 files. It is not necessary to have X.git/objects/??/ > files if an object is contained in one of the packs. Note that I just re-packed the kernel archive on kernel.org, and removed _all_ unpacked files. Once that percolates to the mirrors, the http protocol will be useless without anything like this. That said, I really think the dumb protocols are useless anyway. No other system supports pure static object pulling anyway, and as far as I'm concerned, I want "rsync" to kind of work (but it won't be optimal, since re-packing will delete all the old objects and replace it with the new pack that is downloaded anew). But plain http? I'm not convinced. I'd much rather have a "stupid server" that just listens to a port, and basically forks off and executes "git-upload-pack" when it's connected to (perhaps reading the directory name first). Nothing else. Then we can do a security analysis of upload-pack, which should be fairly easy since it's not actually ever _writing_ anything. At that point, you can do git pull git://www.kernel.org/pub/scm/git/.. and it would just connect to some default "git port", pass off the directory name, and be done with it - exact same discovery protocol that now use for ssh. And "git clone" would also automatically work. Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-07 19:04 ` Linus Torvalds @ 2005-07-07 19:57 ` Junio C Hamano 2005-07-07 21:58 ` Linus Torvalds 2005-07-07 20:00 ` Junio C Hamano ` (2 subsequent siblings) 3 siblings, 1 reply; 66+ messages in thread From: Junio C Hamano @ 2005-07-07 19:57 UTC (permalink / raw) To: Linus Torvalds; +Cc: Petr Baudis, git I have two questions on "rev-list --objects". (1) Would it make sense to have an extra flag to "rev-list --objects" to make it list all the objects reachable from commits listed in its output, even when some of them are unchanged from UNINTERESTING commits? Right now, a pack produced from "rev-list --objects A ^B" does not have enough information to reproduce the tree associated with commit A. (2) When "showing --objects", it lists the top-level tree node with no name, which makes it indistinguishable from commit objects by pack-objects, probably impacting the delta logic. Would something like the following patch make sense, to name such node "."; giving full-path not just the basename to all named nodes would be even better, though. --- # - master: git-format-patch: Prepare patches for e-mail submission. # + (working tree) diff --git a/rev-list.c b/rev-list.c --- a/rev-list.c +++ b/rev-list.c @@ -179,7 +179,10 @@ static void show_commit_list(struct comm die("unknown pending object %s (%s)", sha1_to_hex(obj->sha1), name); } while (objects) { - printf("%s %s\n", sha1_to_hex(objects->item->sha1), objects->name); + const char *name = objects->name; + if (!*name && objects->item->type == tree_type) + name = "."; + printf("%s %s\n", sha1_to_hex(objects->item->sha1), name); objects = objects->next; } } ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-07 19:57 ` Junio C Hamano @ 2005-07-07 21:58 ` Linus Torvalds 2005-07-07 22:10 ` Junio C Hamano 0 siblings, 1 reply; 66+ messages in thread From: Linus Torvalds @ 2005-07-07 21:58 UTC (permalink / raw) To: Junio C Hamano; +Cc: Petr Baudis, git On Thu, 7 Jul 2005, Junio C Hamano wrote: > > (1) Would it make sense to have an extra flag to "rev-list > --objects" to make it list all the objects reachable from > commits listed in its output, even when some of them are > unchanged from UNINTERESTING commits? Right now, a pack > produced from "rev-list --objects A ^B" does not have enough > information to reproduce the tree associated with commit A. Well, that would certainly be possible. Just having a flag that disables "mark_tree_uninteresting()" would do it. > (2) When "showing --objects", it lists the top-level tree node > with no name, which makes it indistinguishable from commit > objects by pack-objects, probably impacting the delta logic. > Would something like the following patch make sense, to name > such node "."; giving full-path not just the basename to > all named nodes would be even better, though. It doesn't impact the delta algorithm, because the objects are sorted by type first, so it never mixes up trees and commits. But if you wanted to, something like this would be cleaner than your suggestion.. Linus diff --git a/rev-list.c b/rev-list.c --- a/rev-list.c +++ b/rev-list.c @@ -154,7 +154,7 @@ static void show_commit_list(struct comm while (list) { struct commit *commit = pop_most_recent_commit(&list, SEEN); - p = process_tree(commit->tree, p, ""); + p = process_tree(commit->tree, p, "tree"); if (process_commit(commit) == STOP) break; } @@ -386,7 +386,7 @@ static struct commit *get_commit_referen mark_tree_uninteresting(tree); return NULL; } - add_pending_object(object, ""); + add_pending_object(object, "tree"); return NULL; } @@ -401,7 +401,7 @@ static struct commit *get_commit_referen mark_blob_uninteresting(blob); return NULL; } - add_pending_object(object, ""); + add_pending_object(object, "blob"); return NULL; } die("%s is unknown object", name); ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-07 21:58 ` Linus Torvalds @ 2005-07-07 22:10 ` Junio C Hamano 0 siblings, 0 replies; 66+ messages in thread From: Junio C Hamano @ 2005-07-07 22:10 UTC (permalink / raw) To: Linus Torvalds; +Cc: Petr Baudis, git >>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes: >> (2) When "showing --objects", it lists the top-level tree node >> with no name, which makes it indistinguishable from commit >> objects by pack-objects, probably impacting the delta logic. >> Would something like the following patch make sense, to name >> such node "."; giving full-path not just the basename to >> all named nodes would be even better, though. LT> It doesn't impact the delta algorithm, because the objects are sorted by LT> type first, so it never mixes up trees and commits. You are correct. I forgot that it does sorting by type. What do you think about giving full-path so that Makefiles in different directories would get different name hashes? ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-07 19:04 ` Linus Torvalds 2005-07-07 19:57 ` Junio C Hamano @ 2005-07-07 20:00 ` Junio C Hamano 2005-07-07 21:29 ` Eric W. Biederman 2005-07-07 22:14 ` [ANNOUNCE] Cogito-0.12 Petr Baudis 3 siblings, 0 replies; 66+ messages in thread From: Junio C Hamano @ 2005-07-07 20:00 UTC (permalink / raw) To: Linus Torvalds; +Cc: Petr Baudis, git >>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes: LT> ... No other LT> system supports pure static object pulling anyway,... That is true, but on the other hand, no other system is easier to be deployed by mere mortals on barebone ISP accounts. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-07 19:04 ` Linus Torvalds 2005-07-07 19:57 ` Junio C Hamano 2005-07-07 20:00 ` Junio C Hamano @ 2005-07-07 21:29 ` Eric W. Biederman 2005-07-07 22:23 ` Linus Torvalds 2005-07-08 1:54 ` Dumb servers (was: [ANNOUNCE] Cogito-0.12) Kevin Smith 2005-07-07 22:14 ` [ANNOUNCE] Cogito-0.12 Petr Baudis 3 siblings, 2 replies; 66+ messages in thread From: Eric W. Biederman @ 2005-07-07 21:29 UTC (permalink / raw) To: Linus Torvalds; +Cc: Junio C Hamano, Petr Baudis, git Linus Torvalds <torvalds@osdl.org> writes: > That said, I really think the dumb protocols are useless anyway. No other > system supports pure static object pulling anyway, and as far as I'm > concerned, I want "rsync" to kind of work (but it won't be optimal, since > re-packing will delete all the old objects and replace it with the new > pack that is downloaded anew). But plain http? I'm not convinced. Have you not looked at tla/arch? tla does supports dumb servers. It's job is a little easier as it has one file per atomic commit I suspect once packs start working well that should not be an issue for git either. For small projects this is a major benefit, as they can just push their files to a convenient http or ftp server. > I'd much rather have a "stupid server" that just listens to a port, and > basically forks off and executes "git-upload-pack" when it's connected to > (perhaps reading the directory name first). Nothing else. Then we can do > a security analysis of upload-pack, which should be fairly easy since it's > not actually ever _writing_ anything. > > At that point, you can do > > git pull git://www.kernel.org/pub/scm/git/.. > > and it would just connect to some default "git port", pass off the > directory name, and be done with it - exact same discovery protocol that > now use for ssh. And "git clone" would also automatically work. For optimizing network bandwidth that sounds like the way to go. For adhoc development I don't know. For a central sever you still need an authenticated way to push content, which makes it another dimension of the problem. So it is mostly a question of what is the sanest way to mirror/publish data. http is used a lot for publishing data and practically everyone has access to a http server that can host content, so I think supporting http makes git a lot more accessible to people. The only thing more accessible seems to be email, and email is terrible for publish small projects. Eric ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-07 21:29 ` Eric W. Biederman @ 2005-07-07 22:23 ` Linus Torvalds 2005-07-08 2:11 ` Eric W. Biederman 2005-07-08 1:54 ` Dumb servers (was: [ANNOUNCE] Cogito-0.12) Kevin Smith 1 sibling, 1 reply; 66+ messages in thread From: Linus Torvalds @ 2005-07-07 22:23 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Junio C Hamano, Petr Baudis, git On Thu, 7 Jul 2005, Eric W. Biederman wrote: > > For optimizing network bandwidth that sounds like the way to go. For > adhoc development I don't know. For a central sever you still need > an authenticated way to push content, which makes it another dimension > of the problem. I'm convinced that "ssh" is the only sane way for pushing. If you don't trust somebody enough to give him ssh access, you shouldn't trust him with write access to your project in the first place. git can actually do ssh with a _very_ restricted shell, if people are worried about shell access. In fact, the _only_ think the shell needs to be able to do is execute one of two programs, so you could have something _really_ trivial in your /etc/passwd as the login shell that doesn't allow anything else. But you'd still use ssh as the authentication protocol. So I don't worry about pushing. I think we've got that covered. It's really the anonymous pulling that needs something. Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-07 22:23 ` Linus Torvalds @ 2005-07-08 2:11 ` Eric W. Biederman 0 siblings, 0 replies; 66+ messages in thread From: Eric W. Biederman @ 2005-07-08 2:11 UTC (permalink / raw) To: Linus Torvalds; +Cc: Junio C Hamano, Petr Baudis, git Linus Torvalds <torvalds@osdl.org> writes: > On Thu, 7 Jul 2005, Eric W. Biederman wrote: >> >> For optimizing network bandwidth that sounds like the way to go. For >> adhoc development I don't know. For a central sever you still need >> an authenticated way to push content, which makes it another dimension >> of the problem. > > I'm convinced that "ssh" is the only sane way for pushing. If you don't > trust somebody enough to give him ssh access, you shouldn't trust him with > write access to your project in the first place. Agreed, I brought that up only so I could dismiss it :) > So I don't worry about pushing. I think we've got that covered. It's > really the anonymous pulling that needs something. So long as we remember there is a tradeoff between efficiency and ease of setup for anonymous access and small projects. Eric ^ permalink raw reply [flat|nested] 66+ messages in thread
* Dumb servers (was: [ANNOUNCE] Cogito-0.12) 2005-07-07 21:29 ` Eric W. Biederman 2005-07-07 22:23 ` Linus Torvalds @ 2005-07-08 1:54 ` Kevin Smith 2005-07-08 2:27 ` Linus Torvalds 1 sibling, 1 reply; 66+ messages in thread From: Kevin Smith @ 2005-07-08 1:54 UTC (permalink / raw) Cc: git Eric W. Biederman wrote: > Linus Torvalds <torvalds@osdl.org> writes: > > >>That said, I really think the dumb protocols are useless anyway. No other >>system supports pure static object pulling anyway, and as far as I'm >>concerned, I want "rsync" to kind of work (but it won't be optimal, since >>re-packing will delete all the old objects and replace it with the new >>pack that is downloaded anew). But plain http? I'm not convinced. > > > Have you not looked at tla/arch? tla does supports dumb servers. > It's job is a little easier as it has one file per atomic commit > I suspect once packs start working well that should not be an > issue for git either. In addition to GNU arch/tla, it it also supported by baz, ArX, darcs, and mercurial. > For small projects this is a major benefit, as they can just push > their files to a convenient http or ftp server. Absolutely. For the kernel it might not make sense, but I view it as a really important feature for tiny projects around the world. Even a CGI requirement makes it impossible to serve a project from free or really cheap web hosts. Plain HTTP is the only protocol available to people who have no extra money to spend on hosting accounts. This happens to be a hot button issue for me, in case you can't tell. Sorry if I'm ranting. Kevin ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: Dumb servers (was: [ANNOUNCE] Cogito-0.12) 2005-07-08 1:54 ` Dumb servers (was: [ANNOUNCE] Cogito-0.12) Kevin Smith @ 2005-07-08 2:27 ` Linus Torvalds 0 siblings, 0 replies; 66+ messages in thread From: Linus Torvalds @ 2005-07-08 2:27 UTC (permalink / raw) To: Kevin Smith; +Cc: Git Mailing List On Thu, 7 Jul 2005, Kevin Smith wrote: > > Absolutely. For the kernel it might not make sense, but I view it as a > really important feature for tiny projects around the world. Even a CGI > requirement makes it impossible to serve a project from free or really > cheap web hosts. Plain HTTP is the only protocol available to people who > have no extra money to spend on hosting accounts. Well, the http approach always works as well as an "rsync", ie you can always replace "rsync" with "wget -r -c" or similar. But the end result will be a purely dumb mirror of what the other side had, ie it will have all the same problems rsync has with things like multiple branches etc (it will get all of them, not just the objects needed from the one branch you're trying to pull). So it's not pretty. But it obviously does work: pack-files haven't changed the fact that git is a append-only thing that lives entirely in the filesystem space and doesn't have any "dynamic content" (ie nothing is hidden inside server state). Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-07 19:04 ` Linus Torvalds ` (2 preceding siblings ...) 2005-07-07 21:29 ` Eric W. Biederman @ 2005-07-07 22:14 ` Petr Baudis 2005-07-07 22:52 ` Linus Torvalds 3 siblings, 1 reply; 66+ messages in thread From: Petr Baudis @ 2005-07-07 22:14 UTC (permalink / raw) To: Linus Torvalds; +Cc: Junio C Hamano, git Let me join the sceptics camp. :-) Dear diary, on Thu, Jul 07, 2005 at 09:04:58PM CEST, I got a letter where Linus Torvalds <torvalds@osdl.org> told me that... > Note that I just re-packed the kernel archive on kernel.org, and removed > _all_ unpacked files. Once that percolates to the mirrors, the http > protocol will be useless without anything like this. *grumble* So, what _is_ then the way to pull now, actually? If we use rsync, won't we end up with having the objects we previous had twice now? > That said, I really think the dumb protocols are useless anyway. No other > system supports pure static object pulling anyway, and as far as I'm > concerned, I want "rsync" to kind of work (but it won't be optimal, since > re-packing will delete all the old objects and replace it with the new > pack that is downloaded anew). But plain http? I'm not convinced. You can always just spider the repository which will work just as well as rsync in the git case. ;-) I think it would be actually simplest (for the user) to have a trivial CGI script on the other side which will do the git-upload-pack stuff. Minimal extra administrative overhead, flexibility, works through proxies, and stuff. People can rewrite it in Perl or PHorridP if they wish and use it on webhosting servers not allowing much else. That's not to say a dedicated server wouldn't have its place too, and that's what's now probably simplest for us. ;-) Now we are in a situation when there's actually no way to pull from your kernel repository without throwing own repository to mess and duplicating data, AFAICS. > I'd much rather have a "stupid server" that just listens to a port, and > basically forks off and executes "git-upload-pack" when it's connected to > (perhaps reading the directory name first). Nothing else. Then we can do > a security analysis of upload-pack, which should be fairly easy since it's > not actually ever _writing_ anything. > > At that point, you can do > > git pull git://www.kernel.org/pub/scm/git/.. > > and it would just connect to some default "git port", pass off the > directory name, and be done with it - exact same discovery protocol that > now use for ssh. And "git clone" would also automatically work. Eek. Could you please make it at least pretend to be extensible? Compare git-upload-pack with git-ssh-pu* - the second one prepends letters to the data it sends so that if you add a new type of stuff to send (say for authentication or some smart tags stuff), you could extend it in a sensible way. What about dividing the communication to "blocks" separated by a newline? Each block would have its first word on the first line saying what kind of block it is - "refs", "have", "want", or "pack" (for simplicity, the pack block might have additional restriction that it's always the last one). If you hit unknown block, you should respond back by something like "huh" and ignore the rest of it. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ <Espy> be careful, some twit might quote you out of context.. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-07 22:14 ` [ANNOUNCE] Cogito-0.12 Petr Baudis @ 2005-07-07 22:52 ` Linus Torvalds 2005-07-07 23:16 ` [PATCH] Pull efficiently from a dumb git store Junio C Hamano 2005-07-07 23:52 ` [ANNOUNCE] Cogito-0.12 Tony Luck 0 siblings, 2 replies; 66+ messages in thread From: Linus Torvalds @ 2005-07-07 22:52 UTC (permalink / raw) To: Petr Baudis; +Cc: Junio C Hamano, git On Fri, 8 Jul 2005, Petr Baudis wrote: > Let me join the sceptics camp. :-) > > Dear diary, on Thu, Jul 07, 2005 at 09:04:58PM CEST, I got a letter > where Linus Torvalds <torvalds@osdl.org> told me that... > > Note that I just re-packed the kernel archive on kernel.org, and removed > > _all_ unpacked files. Once that percolates to the mirrors, the http > > protocol will be useless without anything like this. > > *grumble* > > So, what _is_ then the way to pull now, actually? If we use rsync, won't > we end up with having the objects we previous had twice now? Rsync works fine. You can either unpack the pack you get, or, if you prefer, just run git-prune-packed which will remove the stand-alone object that it finds in packs. Now you're no longer duplicating data, and your repository is smaller than it used to be anyway. Of course, that requires that you trust the packs 100%. It seems to be stable, and I've packed the whole kernel repo, but I actually keep my private tree unpacked still just in case. > I think it would be actually simplest (for the user) to have a trivial > CGI script on the other side which will do the git-upload-pack stuff. Well, git-upload-pack expects the other end to follow the proper protocol, but yes, you can certainly expose it through a web interface and a specialized client that way. Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH] Pull efficiently from a dumb git store. 2005-07-07 22:52 ` Linus Torvalds @ 2005-07-07 23:16 ` Junio C Hamano 2005-07-07 23:50 ` [PATCH] rev-list: add "--objects=self-sufficient" flag Junio C Hamano 2005-07-07 23:50 ` [PATCH] Use --objects=self-sufficient flag to rev-list Junio C Hamano 2005-07-07 23:52 ` [ANNOUNCE] Cogito-0.12 Tony Luck 1 sibling, 2 replies; 66+ messages in thread From: Junio C Hamano @ 2005-07-07 23:16 UTC (permalink / raw) To: Linus Torvalds; +Cc: Petr Baudis, git The git-update-dumb-server-script command statically prepares additional information to describe what the server side has, so that a smart client can pull things efficiently even via a transport such as static-file-only HTTP. The files prepared by the command is $GIT_DIR/info/server, which is a tar archive that contains the following files: rev-cache -- commit ancestry chain, append only to help rsync mirroring. inventory -- list of refs and their SHA1. pack -- list of available prepackaged packs. server.sha1 -- sha1sum output for the above three files (optional). A smart client git-dumb-pull-script works in the following way: - First it slurps these files, and then .idx files that corresponds to the packs described in "pack". - Then it finds the commits that it wants from the server by looking at "inventory" to find various heads, and "rev-cache" to find commits that is missing from the client, and "pack" to figure out downloading which packs is the most efficient way to fill what is missing from its repository. This is done with the help of the git-dumb-pull-resolve command. - Then it slurps the pack files. - The git-http-pull / git-local-pull command walks the commit chain in an old-fashioned way and downloads unpacked objects to fill the rest. Signed-off-by: Junio C Hamano <junkio@cox.net> --- Makefile | 10 + dumb-pull-resolve.c | 239 +++++++++++++++++++++++++++++++++ git-dumb-pull-script | 129 ++++++++++++++++++ git-update-dumb-server-script | 47 ++++++ rev-cache.c | 300 +++++++++++++++++++++++++++++++++++++++++ rev-cache.h | 31 ++++ show-rev-cache.c | 18 ++ update-dumb-server.c | 153 +++++++++++++++++++++ 8 files changed, 925 insertions(+), 2 deletions(-) create mode 100644 dumb-pull-resolve.c create mode 100755 git-dumb-pull-script create mode 100755 git-update-dumb-server-script create mode 100644 rev-cache.c create mode 100644 rev-cache.h create mode 100644 show-rev-cache.c create mode 100644 update-dumb-server.c a880bc7300f070aca3a255828b48390cb9793245 diff --git a/Makefile b/Makefile --- a/Makefile +++ b/Makefile @@ -31,7 +31,8 @@ SCRIPTS=git git-apply-patch-script git-m git-fetch-script git-status-script git-commit-script \ git-log-script git-shortlog git-cvsimport-script git-diff-script \ git-reset-script git-add-script git-checkout-script git-clone-script \ - gitk git-cherry git-rebase-script git-relink-script git-repack-script + gitk git-cherry git-rebase-script git-relink-script git-repack-script \ + git-dumb-pull-script git-update-dumb-server-script PROG= git-update-cache git-diff-files git-init-db git-write-tree \ git-read-tree git-commit-tree git-cat-file git-fsck-cache \ @@ -44,7 +45,8 @@ PROG= git-update-cache git-diff-files git-diff-stages git-rev-parse git-patch-id git-pack-objects \ git-unpack-objects git-verify-pack git-receive-pack git-send-pack \ git-prune-packed git-fetch-pack git-upload-pack git-clone-pack \ - git-show-index + git-show-index git-update-dumb-server git-show-rev-cache \ + git-dumb-pull-resolve all: $(PROG) @@ -58,6 +60,9 @@ LIB_FILE=libgit.a LIB_H=cache.h object.h blob.h tree.h commit.h tag.h delta.h epoch.h csum-file.h \ pack.h pkt-line.h refs.h +LIB_H += rev-cache.h +LIB_OBJS += rev-cache.o + LIB_H += strbuf.h LIB_OBJS += strbuf.o @@ -153,6 +158,7 @@ object.o: $(LIB_H) read-cache.o: $(LIB_H) sha1_file.o: $(LIB_H) usage.o: $(LIB_H) +rev-cache.o: $(LIB_H) strbuf.o: $(LIB_H) gitenv.o: $(LIB_H) entry.o: $(LIB_H) diff --git a/dumb-pull-resolve.c b/dumb-pull-resolve.c new file mode 100644 --- /dev/null +++ b/dumb-pull-resolve.c @@ -0,0 +1,239 @@ +#include "cache.h" +#include "rev-cache.h" + +static const char *dumb_pull_resolve_usage = +"git-dumb_pull_resolve <tmpdir> (<remote> <local>)..."; + +static struct inventory { + struct inventory *next; + unsigned char sha1[20]; + char name[1]; /* more; 1 is for terminating NUL */ +} *inventory; + +static struct inventory *find_inventory(const char *name) +{ + struct inventory *e = inventory; + while (e && strcmp(e->name, name)) + e = e->next; + return e; +} + +static void read_inventory(const char *path) +{ + FILE *fp; + char buf[1024]; + + fp = fopen(path, "r"); + if (!fp) + die("cannot open %s", path); + while (fgets(buf, sizeof(buf), fp)) { + struct inventory *e; + int len = strlen(buf); + if (buf[len-1] != '\n') + die("malformed inventory file"); + buf[--len] = 0; + e = xmalloc(sizeof(*e) + len - 41); + strcpy(e->name, buf + 41); + get_sha1_hex(buf, e->sha1); + e->next = inventory; + inventory = e; + } + fclose(fp); +} + +#define MAX_PACKS 0 +static struct pack { + struct pack *next; + unsigned int *map; + unsigned long pack_size; + unsigned long index_size; + unsigned char ix; + unsigned long fill; + char name[1]; /* more; 1 is for terminating NUL */ +} *pack; + +static void map_pack_idx(const char *path, const char *tmpdir) +{ + FILE *fp; + char buf[1024]; + int num_pack = 0; + + fp = fopen(path, "r"); + if (!fp) + die("cannot open %s", path); + while (fgets(buf, sizeof(buf), fp)) { + struct pack *e; + int len; + int fd; + struct stat st; + char path[PATH_MAX]; + char *cp; + + cp = strchr(buf, ' '); + if (!cp || !*++cp) + die("malformed pack file"); + + len = strlen(cp); + if (cp[len-1] != '\n') + die("malformed pack file"); + cp[--len] = 0; + + if (MAX_PACKS && MAX_PACKS < num_pack) { + error("cannot handle too many packs. ignoring %s", + cp); + continue; + } + + e = xmalloc(sizeof(*e) + len); + strcpy(e->name, cp); + e->pack_size = strtoul(buf, NULL, 10); + + sprintf(path, "%s/%s", tmpdir, cp); + len = strlen(path); + strcpy(path + len - 5, ".idx"); + fd = open(path, O_RDONLY); + if (fd < 0) + goto ignore_entry; + if (fstat(fd, &st)) { + close(fd); + goto ignore_entry; + } + e->index_size = st.st_size; + e->map = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0); + close(fd); + if (e->map == MAP_FAILED) + die("cannot map %s", path); + e->next = pack; + e->ix = num_pack++; + pack = e; + continue; + ignore_entry: + free(e); + } + fclose(fp); +} + +static int find_in_pack_idx(const unsigned char *sha1, struct pack *e) +{ + unsigned int *level1_ofs = e->map; + int hi = ntohl(level1_ofs[*sha1]); + int lo = ((*sha1 == 0x0) ? 0 : ntohl(level1_ofs[*sha1 - 1])); + void *index = e->map + 256; + + do { + int mi = (lo + hi) / 2; + int cmp = memcmp(index + 24 * mi + 4, sha1, 20); + if (!cmp) + return 1; + if (0 < cmp) + hi = mi; + else + lo = mi+1; + } while (lo < hi); + return 0; +} + +static void mark_needed(const unsigned char *sha1) +{ + struct rev_cache *rc; + struct rev_list_elem *rle; + int pos; + + if (has_sha1_file(sha1)) + return; + pos = find_rev_cache(sha1); + if (pos < 0) + die("rev-cache does not match inventory"); + rc = rev_cache[pos]; + rc->work = 1; + for (rle = rc->parents; rle; rle= rle->next) + mark_needed(rle->ri->sha1); +} + +static struct rev_cache *needed; +static unsigned long num_needed; + +static void link_needed(void) +{ + /* Link needed ones for quick traversal */ + int i; + num_needed = 0; + for (i = 0; i < nr_revs; i++) { + struct rev_cache *rc = rev_cache[i]; + if (rc->work) { + rc->work_ptr = needed; + needed = rc; + num_needed++; + } + } +} + +/* Currently this part is stupid, FIXME */ +static void find_optimum_packs(void) +{ + struct rev_cache *rc; + struct pack *e; + unsigned long hits, total; + + hits = total = 0; + for (rc = needed; rc; rc = rc->work_ptr) + rc->work = 0; + + for (e = pack; e; e = e->next) { + e->fill = 0; + for (rc = needed; rc; rc = rc->work_ptr) + if (!rc->work && find_in_pack_idx(rc->sha1, e)) { + rc->work = 1<<(e->ix); + e->fill++; + hits++; + } + if (e->fill) { + fprintf(stderr, "use %s to fill %lu\n", + e->name, e->fill); + total += e->pack_size; + } + } + + fprintf(stderr, "# needed %lu, hits %lu, total %lu\n", + num_needed, hits, total); + for (e = pack; e; e = e->next) + if (e->fill) + printf("%s\n", e->name); +} + +int main(int ac, char **av) +{ + int i; + char path[PATH_MAX]; + const char *tmpdir; + + if (ac < 4 || ac % 2) + usage(dumb_pull_resolve_usage); + + tmpdir = av[1]; + ac--; av++; + + sprintf(path, "%s/inventory", tmpdir); + read_inventory(path); + + sprintf(path, "%s/rev-cache", tmpdir); + read_rev_cache(path, NULL, 0); + + for (i = 1; i < ac; i += 2) { + /* av[i] is a remote branch name */ + struct inventory *e = find_inventory(av[i]); + if (!e) { + error("cannot find branch %s", av[i]); + continue; + } + mark_needed(e->sha1); + } + + link_needed(); + + sprintf(path, "%s/pack", tmpdir); + map_pack_idx(path, tmpdir); + + find_optimum_packs(); + return 0; +} diff --git a/git-dumb-pull-script b/git-dumb-pull-script new file mode 100755 --- /dev/null +++ b/git-dumb-pull-script @@ -0,0 +1,129 @@ +#!/bin/sh + +: ${GIT_DIR=.git} +: ${GIT_OBJECT_DIRECTORY="${GIT_DIR}/objects"} + +usage () { + echo >&2 "* git dumb-pull <url> ( <remote-name> <local-name> ) ..." + exit 1 +} + +error () { + echo >&2 "* git-dumb-pull: $*" + exit 1 +} + +download_one() { + # $1 - URL + # $2 - Local target + case "$1" in + file://* ) + path=/$(expr "$1" : 'file:/*\(.*\)') + cp "$path" "$2" || rm -f "$2" + ;; + http://* | https://* ) + wget -O "$2" "$1" || rm -f "$2" + ;; + esac +} + +case "$#" in +0) + usage;; +esac +url="$1"; shift + +case "$url" in +http://* | https://*) + use_url="$url" + cmd='git-http-pull -a -v' + ;; +file://*) + use_url=/$(expr "$url" : 'file:/*\(.*\)') + cmd='git-local-pull -a -l -v' + ;; +*) + error "Unknown url scheme $url" + ;; +esac + +# The rest of arguments are remote and local names +case $#,$(expr "$#" % 2) in +0,* | 1,* | *,1) + error "Need one or more branch name pairs." ;; +esac + +tmp=.git-dumb-pull-$$ +mkdir "$tmp" || error "cannot create temporary directory" +trap "rm -fr $tmp" 0 1 2 3 15 + +# Failing to download is not fatal. It just means the server is +# dumber than we thought ;-) +if download_one "$url/info/server" $tmp/server +then + infofiles='inventory pack rev-cache' + ( + cd $tmp && + tar xvf server $infofiles || exit 1 + if tar xf server server.sha1 + then + sha1sum -c server.sha1 || { + # did we fail because we did not have sha1sum command? + case "$?" in + 127) + : ;; # the command did not exist. + *) + false ;; + esac + } + else + echo >&2 "* warning: server file lacks sha1 checksum" + fi && + rm -f server.sha1 + ) || exit +fi + +if test -f $tmp/pack +then + while read pack_size pack + do + case "$pack" in + */*) + echo >&2 "* malformed pack $pack" + continue + ;; + esac + + idx=$(expr "$pack" : '\(.*\)\.pack$').idx + # It is possible, even likely, that we already have that + # index file and associated pack file. + if test -f "${GIT_OBJECT_DIRECTORY}/pack/$pack" && + test -f "${GIT_OBJECT_DIRECTORY}/pack/$idx" + then + continue + fi + download_one "$url/objects/pack/$idx" "$tmp/$idx" + done <$tmp/pack + + git-dumb-pull-resolve $tmp "$@" | + while read pack + do + echo >&2 "* $pack" + download_one "$url/objects/pack/$pack" "$tmp/$pack" + if test -f "$tmp/$pack" && git-verify-pack "$tmp/$pack" + then + idx=$(expr "$pack" : '\(.*\)\.pack$').idx + mv "$tmp/$pack" "$tmp/$idx" \ + "${GIT_OBJECT_DIRECTORY}/pack/" + fi + done +fi + +while case "$#" in 0) break ;; esac +do + remote="$1" local="$2" + $cmd -w "$local" "$remote" "$use_url" + + shift + shift +done diff --git a/git-update-dumb-server-script b/git-update-dumb-server-script new file mode 100755 --- /dev/null +++ b/git-update-dumb-server-script @@ -0,0 +1,47 @@ +#!/bin/sh +# +# Copyright (c) 2005, Junio C Hamano +# + +: ${GIT_DIR=.git} +: ${GIT_OBJECT_DIRECTORY="$GIT_DIR/objects"} +export GIT_DIR GIT_OBJECT_DIRECTORY + +infofiles='inventory pack rev-cache' + +usage () { + echo >&2 "* git update-dumb-server" + exit 1 +} + +# Allow 10MB plain SHA1 files to be accumulated before we repack. +max_plain_size=10240 + +plain_size=$( +{ + du -sk "$GIT_OBJECT_DIRECTORY/" "$GIT_OBJECT_DIRECTORY/pack/" | + sed -e 's/^[ ]*\([0-9][0-9]*\)[ ].*/\1/' + echo ' - p' +} | dc) && + +if test $max_plain_size -lt $plain_size >/dev/null +then + git-repack-script && git-prune-packed +fi && + +git-update-dumb-server && + +files=$infofiles +cd "$GIT_DIR/info" && +if sha1sum $infofiles >server.sha1 +then + files="$files server.sha1" +else + rm -f server.sha1 + echo >&2 "* warning: creating server file without sha1sum" +fi && +tar cf server $files && + +# We leave rev-cache there for later runs. +rm -f server.sha1 inventory pack + diff --git a/rev-cache.c b/rev-cache.c new file mode 100644 --- /dev/null +++ b/rev-cache.c @@ -0,0 +1,300 @@ +#include "refs.h" +#include "cache.h" +#include "rev-cache.h" + +struct rev_cache **rev_cache; +int nr_revs, alloc_revs; + +struct rev_list_elem *rle_free; + +#define BATCH_SIZE 512 + +int find_rev_cache(const unsigned char *sha1) +{ + int lo = 0, hi = nr_revs; + while (lo < hi) { + int mi = (lo + hi) / 2; + struct rev_cache *ri = rev_cache[mi]; + int cmp = memcmp(sha1, ri->sha1, 20); + if (!cmp) + return mi; + if (cmp < 0) + hi = mi; + else + lo = mi + 1; + } + return -lo - 1; +} + +static struct rev_list_elem *alloc_list_elem(void) +{ + struct rev_list_elem *rle; + if (!rle_free) { + int i; + + rle = xmalloc(sizeof(*rle) * BATCH_SIZE); + for (i = 0; i < BATCH_SIZE - 1; i++) { + rle[i].ri = NULL; + rle[i].next = &rle[i + 1]; + } + rle[BATCH_SIZE - 1].ri = NULL; + rle[BATCH_SIZE - 1].next = NULL; + rle_free = rle; + } + rle = rle_free; + rle_free = rle->next; + return rle; +} + +static struct rev_cache *create_rev_cache(const unsigned char *sha1) +{ + struct rev_cache *ri; + int pos = find_rev_cache(sha1); + + if (0 <= pos) + return rev_cache[pos]; + pos = -pos - 1; + if (alloc_revs <= ++nr_revs) { + alloc_revs = alloc_nr(alloc_revs); + rev_cache = xrealloc(rev_cache, sizeof(ri) * alloc_revs); + } + if (pos < nr_revs) + memmove(rev_cache + pos + 1, rev_cache + pos, + (nr_revs - pos - 1) * sizeof(ri)); + ri = xcalloc(1, sizeof(*ri)); + memcpy(ri->sha1, sha1, 20); + rev_cache[pos] = ri; + return ri; +} + +static unsigned char last_sha1[20]; + +static void write_one_rev_cache(FILE *rev_cache_file, struct rev_cache *ri) +{ + unsigned char flag; + struct rev_list_elem *rle; + + if (ri->written) + return; + + if (ri->parsed) { + + /* We use last_sha1 compression only for the first parent; + * otherwise the resulting rev-cache would lose the parent + * order information. + */ + if (ri->parents && + !memcmp(ri->parents->ri->sha1, last_sha1, 20)) + flag = (ri->num_parents - 1) | 0x80; + else + flag = ri->num_parents; + + fwrite(ri->sha1, 20, 1, rev_cache_file); + fwrite(&flag, 1, 1, rev_cache_file); + for (rle = ri->parents; rle; rle = rle->next) { + if (flag & 0x80 && rle == ri->parents) + continue; + fwrite(rle->ri->sha1, 20, 1, rev_cache_file); + } + memcpy(last_sha1, ri->sha1, 20); + ri->written = 1; + } + /* recursively write children depth first */ + for (rle = ri->children; rle; rle = rle->next) + write_one_rev_cache(rev_cache_file, rle->ri); +} + +void write_rev_cache(const char *path) +{ + /* write the following commit ancestry information in + * $GIT_DIR/info/rev-cache. + * + * The format is: + * 20-byte SHA1 (commit ID) + * 1-byte flag: + * - bit 0-6 records "number of parent commit SHA1s to + * follow" (i.e. up to 127 children can be listed). + * - when the bit 7 is on, then "the entry immediately + * before this entry is one of the parents of this + * commit". + * N x 20-byte SHA1 (parent commit IDs) + */ + FILE *rev_cache_file; + int i; + struct rev_cache *ri; + + rev_cache_file = fopen(path, "a"); + if (!rev_cache_file) + die("cannot append to rev cache file."); + + memset(last_sha1, 0, 20); + + /* Go through available rev_cache structures, starting from + * parentless ones first, so that we would get most out of + * last_sha1 optimization by the depth first behaviour of + * write_one_rev_cache(). + */ + for (i = 0; i < nr_revs; i++) { + ri = rev_cache[i]; + if (ri->num_parents) + continue; + write_one_rev_cache(rev_cache_file, ri); + } + /* Then the rest */ + for (i = 0; i < nr_revs; i++) { + ri = rev_cache[i]; + write_one_rev_cache(rev_cache_file, ri); + } + + fclose(rev_cache_file); +} + +static void add_parent(struct rev_cache *child, + const unsigned char *parent_sha1) +{ + struct rev_cache *parent = create_rev_cache(parent_sha1); + struct rev_list_elem *e = alloc_list_elem(); + + /* Keep the parent list ordered in the same way the commit + * object records them. + */ + e->ri = parent; + e->next = NULL; + if (!child->parents_tail) + child->parents = e; + else + child->parents_tail->next = e; + child->parents_tail = e; + child->num_parents++; + + /* There is no inherent order of the children so we just + * LIFO them together. + */ + e = alloc_list_elem(); + e->next = parent->children; + parent->children = e; + e->ri = child; + parent->num_children++; +} + +int read_rev_cache(const char *path, FILE *dumpfile, int dry_run) +{ + unsigned char *map; + int fd; + struct stat st; + unsigned long ofs, len; + struct rev_cache *ri = NULL; + + fd = open(path, O_RDONLY); + if (fd < 0) + return 0; + if (fstat(fd, &st)) { + close(fd); + return -1; + } + map = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0); + if (map == MAP_FAILED) { + close(fd); + return -1; + } + close(fd); + + memset(last_sha1, 0, 20); + ofs = 0; + len = st.st_size; + while (ofs < len) { + unsigned char sha1[20]; + int flag, cnt, i; + if (len < ofs + 21) + die("rev-cache too short"); + memcpy(sha1, map + ofs, 20); + flag = map[ofs + 20]; + ofs += 21; + cnt = (flag & 0x7f) + ((flag & 0x80) != 0); + if (len < ofs + (flag & 0x7f) * 20) + die("rev-cache too short to have %d more parents", + (flag & 0x7f)); + if (dumpfile) + fprintf(dumpfile, "%s", sha1_to_hex(sha1)); + if (!dry_run) { + ri = create_rev_cache(sha1); + ri->written = 1; + ri->parsed = 1; + if (!ri) + die("cannot create rev-cache for %s", + sha1_to_hex(sha1)); + } + i = 0; + if (flag & 0x80) { + if (!dry_run) + add_parent(ri, last_sha1); + if (dumpfile) + fprintf(dumpfile, " %s", + sha1_to_hex(last_sha1)); + i++; + } + while (i++ < cnt) { + if (!dry_run) + add_parent(ri, map + ofs); + if (dumpfile) + fprintf(dumpfile, " %s", + sha1_to_hex(last_sha1)); + ofs += 20; + } + if (dumpfile) + fprintf(dumpfile, "\n"); + memcpy(last_sha1, sha1, 20); + } + if (ofs != len) + die("rev-cache truncated?"); + munmap(map, len); + return 0; +} + +int record_rev_cache(const unsigned char *sha1) +{ + unsigned char parent[20]; + char type[20]; + unsigned long size, ofs; + unsigned int cnt, i; + void *buf; + struct rev_cache *ri; + + buf = read_sha1_file(sha1, type, &size); + if (!buf) + return 1; /* unavailable */ + if (strcmp(type, "commit")) { + /* could be a tag or tree */ + free(buf); + return 1; + } + ri = create_rev_cache(sha1); + if (ri->parsed) + return 0; + + cnt = 0; + ofs = 46; /* "tree " + hex-sha1 + "\n" */ + while (!memcmp(buf + ofs, "parent ", 7) && + !get_sha1_hex(buf + ofs + 7, parent)) { + ofs += 48; + cnt++; + } + if (cnt * 48 + 46 != ofs) { + free(buf); + return error("internal error in record_rev_cache"); + } + + ri = create_rev_cache(sha1); + ri->parsed = 1; + + for (i = 0; i < cnt; i++) { + unsigned char parent_sha1[20]; + + ofs = 46 + i * 48 + 7; + get_sha1_hex(buf + ofs, parent_sha1); + add_parent(ri, parent_sha1); + record_rev_cache(parent_sha1); + } + free(buf); + return 0; +} diff --git a/rev-cache.h b/rev-cache.h new file mode 100644 --- /dev/null +++ b/rev-cache.h @@ -0,0 +1,31 @@ +#ifndef REV_CACHE_H +#define REV_CACHE_H + +#define REV_CACHE_PATH "info/rev-cache" + +extern struct rev_cache { + struct rev_cache *head_list; + struct rev_list_elem *children; + struct rev_list_elem *parents; + struct rev_list_elem *parents_tail; + unsigned short num_parents; + unsigned short num_children; + unsigned int written : 1; + unsigned int parsed : 1; + unsigned int work : 30; + void *work_ptr; + unsigned char sha1[20]; +} **rev_cache; +extern int nr_revs, alloc_revs; + +struct rev_list_elem { + struct rev_list_elem *next; + struct rev_cache *ri; +}; + +extern int find_rev_cache(const unsigned char *); +extern int read_rev_cache(const char *, FILE *, int); +extern int record_rev_cache(const unsigned char *); +extern void write_rev_cache(const char *); + +#endif diff --git a/show-rev-cache.c b/show-rev-cache.c new file mode 100644 --- /dev/null +++ b/show-rev-cache.c @@ -0,0 +1,18 @@ +#include "cache.h" +#include "rev-cache.h" + +static char *dump_rev_cache_usage = +"git-dump-rev-cache <rev-cache-file>"; + +int main(int ac, char **av) +{ + while (1 < ac && av[0][1] == '-') { + /* do flags here */ + break; + ac--; av++; + } + if (ac != 2) + usage(dump_rev_cache_usage); + + return read_rev_cache(av[1], stdout, 1); +} diff --git a/update-dumb-server.c b/update-dumb-server.c new file mode 100644 --- /dev/null +++ b/update-dumb-server.c @@ -0,0 +1,153 @@ +#include "refs.h" +#include "cache.h" +#include "rev-cache.h" + +static FILE *inventory_file; +static int verbose = 0; + +static int do_refs(const char *path, const unsigned char *sha1) +{ + /* path is like .git/refs/heads/master */ + int pfxlen = 10; /* strlen(".git/refs/") */ + fprintf(inventory_file, "%s %s\n", sha1_to_hex(sha1), path + pfxlen); + if (verbose) + fprintf(stderr, "inventory %s %s\n", + sha1_to_hex(sha1), path + pfxlen); + record_rev_cache(sha1); + return 0; +} + +static int inventory(void) +{ + /* write names of $GIT_DIR/refs/?*?/?* files in + * $GIT_DIR/info/inventory, and find the ancestry + * information. + */ + char path[PATH_MAX]; + + strcpy(path, git_path("info/inventory")); + safe_create_leading_directories(path); + inventory_file = fopen(path, "w"); + if (!inventory_file) + die("cannot create inventory file."); + for_each_ref(do_refs); + fclose(inventory_file); + return 0; +} + +static int compare_pack_size(const void *a_, const void *b_) +{ + struct packed_git *const*a = a_; + struct packed_git *const*b = b_; + if ((*a)->pack_size < (*b)->pack_size) + return 1; + else if ((*a)->pack_size == (*b)->pack_size) + return 0; + return -1; +} + +static int write_packs(void) +{ + /* write names of pack files under $GIT_OBJECT_DIRECTORY/pack + * into $GIT_DIR/info/packs. + */ + struct packed_git *p; + char path[PATH_MAX]; + FILE *packs_file; + int pfxlen = strlen(".git/objects/pack/"); + struct packed_git **list; + int cnt, i; + + for (cnt = 0, p = packed_git; p; p = p->next) + cnt++; + list = xmalloc(sizeof(*list) * cnt); + for (i = 0, p = packed_git; p; p = p->next) + list[i++] = p; + qsort(list, cnt, sizeof(*list), compare_pack_size); + + strcpy(path, git_path("info/pack")); + safe_create_leading_directories(path); + packs_file = fopen(path, "w"); + if (!packs_file) + return -1; + for (i = 0; i < cnt; i++) { + p = list[i]; + fprintf(packs_file, "%lu %s\n", + p->pack_size, p->pack_name + pfxlen); + if (verbose) + fprintf(stderr, "pack %lu %s\n", + p->pack_size, + p->pack_name + pfxlen); + } + free(list); + fclose(packs_file); + return 0; +} + +static int inventory_packs(void) +{ + struct packed_git *p; + + for (p = packed_git; p; p = p->next) { + int nth, lim; + lim = num_packed_objects(p); + for (nth = 0; nth < lim; nth++) { + unsigned char sha1[20]; + char type[20]; + if (nth_packed_object_sha1(p, nth, sha1)) { + error("cannot read %dth object from pack %s", + nth, p->pack_name); + continue; + } + if (sha1_object_info(sha1, type, NULL)) { + error("cannot find type of %s", sha1_to_hex(sha1)); + continue; + } + if (strcmp(type, "commit")) + continue; + record_rev_cache(sha1); + } + } + return 0; +} + +static const char *update_dumb_server_usage = +"git-update-dumb-server [-v] [-a]"; + +int main(int ac, char **av) +{ + char path[PATH_MAX]; + int all_commits = 0; + + while (1 < ac && av[1][0] == '-') { + if (!strcmp(av[1], "-v")) + verbose = 1; + else if (!strcmp(av[1], "-a")) + all_commits = 1; + else + usage(update_dumb_server_usage); + ac--; av++; + } + + /* read existing rev-cache if any */ + strcpy(path, git_path(REV_CACHE_PATH)); + read_rev_cache(path, verbose ? stderr : NULL, 0); + + /* read refs directory and find commit ancentry information */ + inventory(); + + /* + * prepare info/pack file. + * Note that we do prepare_packed_git() in case we ran in + * an headless repository. + */ + prepare_packed_git(); + write_packs(); + + if (all_commits) + inventory_packs(); + + /* update the rev-cache database by appending newly found one to it */ + write_rev_cache(path); + return 0; +} ------------ ^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH] rev-list: add "--objects=self-sufficient" flag. 2005-07-07 23:16 ` [PATCH] Pull efficiently from a dumb git store Junio C Hamano @ 2005-07-07 23:50 ` Junio C Hamano 2005-07-07 23:58 ` Linus Torvalds 2005-07-07 23:50 ` [PATCH] Use --objects=self-sufficient flag to rev-list Junio C Hamano 1 sibling, 1 reply; 66+ messages in thread From: Junio C Hamano @ 2005-07-07 23:50 UTC (permalink / raw) To: Linus Torvalds; +Cc: git When --objects=self-sufficient is specified instead of usual "--objects", rev-list shows all objects reachable from trees associated with the commits in its output. This can be used to ensure that a single pack can be used to recreate the tree associated with every commit in it. Signed-off-by: Junio C Hamano <junkio@cox.net> --- *** This makes things easier for the dumb puller because *** self-sufficient pack means less falling back on traditional *** http-pull. rev-list.c | 7 ++- t/t6100-rev-list-object.sh | 97 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 102 insertions(+), 2 deletions(-) create mode 100644 t/t6100-rev-list-object.sh 60563326cea81f89098a88ab716fb4f02e326b43 diff --git a/rev-list.c b/rev-list.c --- a/rev-list.c +++ b/rev-list.c @@ -27,6 +27,7 @@ static int bisect_list = 0; static int tag_objects = 0; static int tree_objects = 0; static int blob_objects = 0; +static int objects_self_sufficient = 0; static int verbose_header = 0; static int show_parents = 0; static int hdr_termination = 0; @@ -198,7 +199,7 @@ static void mark_tree_uninteresting(stru struct object *obj = &tree->object; struct tree_entry_list *entry; - if (!tree_objects) + if (!tree_objects || objects_self_sufficient) return; if (obj->flags & UNINTERESTING) return; @@ -448,7 +449,9 @@ int main(int argc, char **argv) bisect_list = 1; continue; } - if (!strcmp(arg, "--objects")) { + if (!strncmp(arg, "--objects", 9)) { + if (!strcmp(arg+9, "=self-sufficient")) + objects_self_sufficient = 1; tag_objects = 1; tree_objects = 1; blob_objects = 1; diff --git a/t/t6100-rev-list-object.sh b/t/t6100-rev-list-object.sh new file mode 100644 --- /dev/null +++ b/t/t6100-rev-list-object.sh @@ -0,0 +1,97 @@ +#!/bin/sh +# +# Copyright (c) 2005 Junio C Hamano +# + +test_description='git-rev-list --objects test. + +' +. ./test-lib.sh + +GIT_AUTHOR_DATE='+0000 946684801' +GIT_AUTHOR_NAME=none +GIT_AUTHOR_EMAIL=none@none +GIT_COMMITTER_DATE='+0000 946684801' +GIT_COMMITTER_NAME=none +GIT_COMMITTER_EMAIL=none@none +export GIT_AUTHOR_DATE GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL \ + GIT_COMMITTER_DATE GIT_COMMITTER_NAME GIT_COMMITTER_EMAIL + +_x40='[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]' +_x40="$_x40$_x40$_x40$_x40$_x40$_x40$_x40$_x40" +sedScript='s/^\('"$_x40"' [^ ]*\) .*/\1/p' + +test_expect_success setup ' + for i in frotz nitfol + do + echo $i >$i && + git-update-cache --add $i || exit + done && + tree0=$(git-write-tree) && + commit0=$(git-commit-tree $tree0) && + echo $tree0 && + echo $commit0 && + git-ls-tree -r $tree0 && + echo nitfol nitfol >nitfol && + git-update-cache --add nitfol && + tree1=$(git-write-tree) && + commit1=$(git-commit-tree $tree1 -p $commit0) && + echo $tree1 && + echo $commit1 && + git-ls-tree -r $tree1 +' </dev/null + +test_expect_success 'pack #0' ' + name0=$(git-rev-list --objects $commit0 | \ + git-pack-objects pk0) && + ls pk0-* && + git-verify-pack -v pk0-$name0.idx | + sed -ne "$sedScript" | sort >contents.0 +' + +test_expect_success 'pack #1 (commit 1 except commit 0)' ' + name1=$(git-rev-list --objects $commit1 ^$commit0 | \ + git-pack-objects pk1) && + ls pk1-* && + git-verify-pack -v pk1-$name1.idx | + sed -ne "$sedScript" | sort >contents.1 +' + +test_expect_success 'there should not be any overlaps' ' + case $(comm -12 contents.0 contents.1 | wc -l) in + 0) ;; + *) false ;; + esac +' + +test_expect_success 'pack #2 (commit 1 unpacked only)' ' + ln pk0-* .git/objects/pack/. && + name2=$(git-rev-list --objects --unpacked $commit1 | \ + git-pack-objects pk2) && + ls pk2-* && + git-verify-pack -v pk1-$name2.idx | + sed -ne "$sedScript" | sort >contents.2 +' + +test_expect_success 'pack #1 and #2 should be the same' ' + diff contents.1 contents.2 +' + +test_expect_success 'pack #3 (commit 1 except commit 0, self-sufficient)' ' + name3=$(git-rev-list --objects=self-sufficient $commit1 ^$commit0 | \ + git-pack-objects pk3) && + ls pk3-* && + git-verify-pack -v pk3-$name3.idx | + sed -ne "$sedScript" | sort >contents.3 +' + +ls_tree_to_invent='s/^[0-9]* \([^ ]*\) \('"$_x40"'\) .*/\2 \1/' +test_expect_success 'make sure pack #3 is not missing anything from commit1' ' + ( + echo "$tree1 tree" + echo "$commit1 commit" + git-ls-tree "$tree1" | sed -e "$ls_tree_to_invent" + ) | sort >tree-contents.1 && + comm -23 tree-contents.1 contents.3 >missing.3 && + diff /dev/null missing.3 +' ------------ ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] rev-list: add "--objects=self-sufficient" flag. 2005-07-07 23:50 ` [PATCH] rev-list: add "--objects=self-sufficient" flag Junio C Hamano @ 2005-07-07 23:58 ` Linus Torvalds 2005-07-08 1:02 ` [PATCH] rev-list: add "--full-objects" flag Junio C Hamano 2005-07-08 1:03 ` [PATCH] Give --full-objects flag to rev-list when preparing a dumb server Junio C Hamano 0 siblings, 2 replies; 66+ messages in thread From: Linus Torvalds @ 2005-07-07 23:58 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Thu, 7 Jul 2005, Junio C Hamano wrote: > > - if (!strcmp(arg, "--objects")) { > + if (!strncmp(arg, "--objects", 9)) { > + if (!strcmp(arg+9, "=self-sufficient")) > + objects_self_sufficient = 1; This is nasty - if you mis-spell "self-sufficient" (easy enough to do) you'll never know the end result isn't what you expected. It won't warn you in any way, it will just make a non-self-sufficient pack.. Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH] rev-list: add "--full-objects" flag. 2005-07-07 23:58 ` Linus Torvalds @ 2005-07-08 1:02 ` Junio C Hamano 2005-07-08 1:33 ` Linus Torvalds 2005-07-08 1:46 ` Linus Torvalds 2005-07-08 1:03 ` [PATCH] Give --full-objects flag to rev-list when preparing a dumb server Junio C Hamano 1 sibling, 2 replies; 66+ messages in thread From: Junio C Hamano @ 2005-07-08 1:02 UTC (permalink / raw) To: Linus Torvalds; +Cc: git >>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes: LT> This is nasty - if you mis-spell "self-sufficient" (easy enough to do) LT> you'll never know the end result isn't what you expected. It won't warn LT> you in any way, it will just make a non-self-sufficient pack.. Again you are right. How about --full-objects instead? ------------ When --full-objects is specified instead of usual "--objects", rev-list shows all objects reachable from trees associated with the commits in its output. This can be used to ensure that a single pack can be used to recreate the tree associated with every commit in it. Signed-off-by: Junio C Hamano <junkio@cox.net> --- rev-list.c | 13 +++++- t/t6100-rev-list-object.sh | 98 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 109 insertions(+), 2 deletions(-) create mode 100644 t/t6100-rev-list-object.sh 24c31c0417a54a6ca6dc1b86267bccbbfe87c7d8 diff --git a/rev-list.c b/rev-list.c --- a/rev-list.c +++ b/rev-list.c @@ -17,6 +17,7 @@ static const char rev_list_usage[] = " --min-age=epoch\n" " --bisect\n" " --objects\n" + " --full-objects\n" " --unpacked\n" " --header\n" " --pretty\n" @@ -27,6 +28,7 @@ static int bisect_list = 0; static int tag_objects = 0; static int tree_objects = 0; static int blob_objects = 0; +static int objects_self_sufficient = 0; static int verbose_header = 0; static int show_parents = 0; static int hdr_termination = 0; @@ -198,7 +200,7 @@ static void mark_tree_uninteresting(stru struct object *obj = &tree->object; struct tree_entry_list *entry; - if (!tree_objects) + if (!tree_objects || objects_self_sufficient) return; if (obj->flags & UNINTERESTING) return; @@ -448,7 +450,14 @@ int main(int argc, char **argv) bisect_list = 1; continue; } - if (!strcmp(arg, "--objects")) { + if (!strncmp(arg, "--objects", 9)) { + tag_objects = 1; + tree_objects = 1; + blob_objects = 1; + continue; + } + if (!strncmp(arg, "--full-objects", 9)) { + objects_self_sufficient = 1; tag_objects = 1; tree_objects = 1; blob_objects = 1; diff --git a/t/t6100-rev-list-object.sh b/t/t6100-rev-list-object.sh new file mode 100644 --- /dev/null +++ b/t/t6100-rev-list-object.sh @@ -0,0 +1,98 @@ +#!/bin/sh +# +# Copyright (c) 2005 Junio C Hamano +# + +test_description='git-rev-list --objects test. + +' +. ./test-lib.sh + +GIT_AUTHOR_DATE='+0000 946684801' +GIT_AUTHOR_NAME=none +GIT_AUTHOR_EMAIL=none@none +GIT_COMMITTER_DATE='+0000 946684801' +GIT_COMMITTER_NAME=none +GIT_COMMITTER_EMAIL=none@none +export GIT_AUTHOR_DATE GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL \ + GIT_COMMITTER_DATE GIT_COMMITTER_NAME GIT_COMMITTER_EMAIL + +_x40='[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]' +_x40="$_x40$_x40$_x40$_x40$_x40$_x40$_x40$_x40" +sedScript='s/^\('"$_x40"' [^ ]*\) .*/\1/p' + +test_expect_success setup ' + for i in frotz nitfol + do + echo $i >$i && + git-update-cache --add $i || exit + done && + tree0=$(git-write-tree) && + commit0=$(git-commit-tree $tree0) && + echo $tree0 && + echo $commit0 && + git-ls-tree -r $tree0 && + echo nitfol nitfol >nitfol && + rm -f frotz && + git-update-cache --add nitfol --remove frotz && + tree1=$(git-write-tree) && + commit1=$(git-commit-tree $tree1 -p $commit0) && + echo $tree1 && + echo $commit1 && + git-ls-tree -r $tree1 +' </dev/null + +test_expect_success 'pack #0' ' + name0=$(git-rev-list --objects $commit0 | \ + git-pack-objects pk0) && + ls pk0-* && + git-verify-pack -v pk0-$name0.idx | + sed -ne "$sedScript" | sort >contents.0 +' + +test_expect_success 'pack #1 (commit 1 except commit 0)' ' + name1=$(git-rev-list --objects $commit1 ^$commit0 | \ + git-pack-objects pk1) && + ls pk1-* && + git-verify-pack -v pk1-$name1.idx | + sed -ne "$sedScript" | sort >contents.1 +' + +test_expect_success 'there should not be any overlaps' ' + case $(comm -12 contents.0 contents.1 | wc -l) in + 0) ;; + *) false ;; + esac +' + +test_expect_success 'pack #2 (commit 1 unpacked only)' ' + ln pk0-* .git/objects/pack/. && + name2=$(git-rev-list --objects --unpacked $commit1 | \ + git-pack-objects pk2) && + ls pk2-* && + git-verify-pack -v pk1-$name2.idx | + sed -ne "$sedScript" | sort >contents.2 +' + +test_expect_success 'pack #1 and #2 should be the same' ' + diff contents.1 contents.2 +' + +test_expect_success 'pack #3 (commit 1 except commit 0, self-sufficient)' ' + name3=$(git-rev-list --full-objects $commit1 ^$commit0 | \ + git-pack-objects pk3) && + ls pk3-* && + git-verify-pack -v pk3-$name3.idx | + sed -ne "$sedScript" | sort >contents.3 +' + +ls_tree_to_invent='s/^[0-9]* \([^ ]*\) \('"$_x40"'\) .*/\2 \1/' +test_expect_success 'make sure pack #3 is not missing anything from commit1' ' + ( + echo "$tree1 tree" + echo "$commit1 commit" + git-ls-tree "$tree1" | sed -e "$ls_tree_to_invent" + ) | sort >tree-contents.1 && + comm -23 tree-contents.1 contents.3 >missing.3 && + diff /dev/null missing.3 +' ------------ ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] rev-list: add "--full-objects" flag. 2005-07-08 1:02 ` [PATCH] rev-list: add "--full-objects" flag Junio C Hamano @ 2005-07-08 1:33 ` Linus Torvalds 2005-07-08 1:46 ` Linus Torvalds 1 sibling, 0 replies; 66+ messages in thread From: Linus Torvalds @ 2005-07-08 1:33 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Thu, 7 Jul 2005, Junio C Hamano wrote: > > Again you are right. How about --full-objects instead? I don't mind the "--objects=xxx" format per se, but it would need to verify that the "=xxx" was either valid or wasn't there at all. So what I objected to was not that it was easy to mis-spell, but that if misspelled, the program wouldn't point it out as an error, but silently just do the wrong thing. Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] rev-list: add "--full-objects" flag. 2005-07-08 1:02 ` [PATCH] rev-list: add "--full-objects" flag Junio C Hamano 2005-07-08 1:33 ` Linus Torvalds @ 2005-07-08 1:46 ` Linus Torvalds 2005-07-08 2:17 ` Junio C Hamano 1 sibling, 1 reply; 66+ messages in thread From: Linus Torvalds @ 2005-07-08 1:46 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Thu, 7 Jul 2005, Junio C Hamano wrote: > > When --full-objects is specified instead of usual "--objects", > rev-list shows all objects reachable from trees associated with > the commits in its output. This can be used to ensure that a > single pack can be used to recreate the tree associated with > every commit in it. Hmm.. The more I think about it, the less I think this is about "full objects". After all, we won't have all objects: the pack will still cut off any commits that may be reachable but not interesting. So this is more specifically about full _trees_, not objects per se. So while the name of the option doesn't really matter all that much, I do think it would make more sense as "--whole-trees" or something like that. However, I really don't think it's a very useful option in the first place. Any dumb web-based thing that depends on "--whole-trees" would suck horribly. For the kernel, it means that you'd be guaranteed 17,000+ files, and there would be very few deltas in there, so you'd have this 40MB+ pack-file. Which is _not_ an acceptable way of getting updates. Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] rev-list: add "--full-objects" flag. 2005-07-08 1:46 ` Linus Torvalds @ 2005-07-08 2:17 ` Junio C Hamano 2005-07-08 2:39 ` Linus Torvalds 0 siblings, 1 reply; 66+ messages in thread From: Junio C Hamano @ 2005-07-08 2:17 UTC (permalink / raw) To: Linus Torvalds; +Cc: git >>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes: LT> However, I really don't think it's a very useful option in LT> the first place. Any dumb web-based thing that depends on LT> "--whole-trees" would suck horribly. I agree with these two sentences now. However it does not automatically mean that the avenue I have been pursuing would not work; the server side preparation needs to be a bit more careful than what I sent, which unconditionally runs "prune-packed". It instead should leave the files that "--whole-trees" would have packed as plain SHA1 files, so that the bulk is obtained by statically generated packs and the rest can be handled in the commit-chain walker as before. So, the server side preparation needs be tweaked to do something like: (1) Repack when necessary (no --whole-trees). (2) For each .git/objects/pack/ pack, make a list of trees and blobs that are missing from the commits that contained in the same pack. (3) Run "prune-packed" but do not prune objects on the list produced in the previous step. (4) Take inventory, rev-cache, and pack, as done by the posted patch. The determination of (1) is a bit problematic since "when necessary" is not "when .git/objects/?? grew too big" anymore, due to the fact that (3) would deliberately leave plain SHA1 files there. A completely different way would be to prepare packs of objects based on age, and create an inventory of such packs. Have client download such an inventory, which essentially says "if you have this commit, then slurp these packs and you are done." ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] rev-list: add "--full-objects" flag. 2005-07-08 2:17 ` Junio C Hamano @ 2005-07-08 2:39 ` Linus Torvalds 2005-07-09 21:09 ` Eric W. Biederman [not found] ` <7vy88gzn6s.fsf@assigned-by-dhcp.cox.net> 0 siblings, 2 replies; 66+ messages in thread From: Linus Torvalds @ 2005-07-08 2:39 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Thu, 7 Jul 2005, Junio C Hamano wrote: > > However it does not automatically mean that the avenue I have > been pursuing would not work; the server side preparation needs > to be a bit more careful than what I sent, which unconditionally > runs "prune-packed". It instead should leave the files that > "--whole-trees" would have packed as plain SHA1 files, so that > the bulk is obtained by statically generated packs and the rest > can be handled in the commit-chain walker as before. I really think the commit-chain walker needs to run locally (ie at the server end, or after fetching all the objects from the server). I don't know how much you've tried out the git-http-pull and git-ssh-pull things, but their performance was quite horrid for anything half-way bigger, because of the totally synchronized IO. The "fetch one object, parse it, fetch the next one, parse that.." approach is just horrible. I ended up preferring the "rsync" thing even though rsync sucked badly on big object stores too, if only because when rsync got working, it at least nicely pipelined the transfers, and would transfer things ten times faster than git-ssh-pull did (maybe I'm exaggerating, but I don't think so, it really felt that way). And the thing is, if you purely follow one tree (which is likely the common case for a lot of users), then you are actually always likely better off with the "mirror it" model. Which is _not_ a good model for developers (for example, me rsync'ing from Jeff's kernel repository always got me hundreds of useless objects), but it's fine for somebody who actually just wants to track somebody else. And then you really can use just rsync or wget or ncftpget or anything else that has a "fetch recursively, optimizing existing objects" mode. Now, re-packing ends up causing some double transmissions, but I bet the cost of those are going to be less than the cost of the "ping-pong for each object" approach. Especially as most of the repacked objects will be deltas if the repacking is done properly. Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] rev-list: add "--full-objects" flag. 2005-07-08 2:39 ` Linus Torvalds @ 2005-07-09 21:09 ` Eric W. Biederman 2005-07-10 5:11 ` Linus Torvalds ` (2 more replies) [not found] ` <7vy88gzn6s.fsf@assigned-by-dhcp.cox.net> 1 sibling, 3 replies; 66+ messages in thread From: Eric W. Biederman @ 2005-07-09 21:09 UTC (permalink / raw) To: Linus Torvalds; +Cc: Junio C Hamano, git Linus Torvalds <torvalds@osdl.org> writes: > On Thu, 7 Jul 2005, Junio C Hamano wrote: >> >> However it does not automatically mean that the avenue I have >> been pursuing would not work; the server side preparation needs >> to be a bit more careful than what I sent, which unconditionally >> runs "prune-packed". It instead should leave the files that >> "--whole-trees" would have packed as plain SHA1 files, so that >> the bulk is obtained by statically generated packs and the rest >> can be handled in the commit-chain walker as before. > The "fetch one object, parse it, fetch the next one, parse that.." > approach is just horrible. Agreed. That does not cover up latency at all and depending on the parsing cost can potentially even keep you from having anything on your network connection for a noticeable amount of time. > I ended up preferring the "rsync" thing even though rsync sucked badly on > big object stores too, if only because when rsync got working, it at least > nicely pipelined the transfers, and would transfer things ten times faster > than git-ssh-pull did (maybe I'm exaggerating, but I don't think so, it > really felt that way). This feels to me like an implementation issue (no pipelining) rather than a design issue (pipelining is impossible). > And the thing is, if you purely follow one tree (which is likely the > common case for a lot of users), then you are actually always likely > better off with the "mirror it" model. Which is _not_ a good model for > developers (for example, me rsync'ing from Jeff's kernel repository always > got me hundreds of useless objects), but it's fine for somebody who > actually just wants to track somebody else. I assume the problem with the mirror it model was simply there were to many objects? > And then you really can use just rsync or wget or ncftpget or anything > else that has a "fetch recursively, optimizing existing objects" mode. Sane. But with an intelligent fetcher and a little extra information a dumb server should still be able to not fetch branches we care nothing about. I think that extra information is simply commit object graph and which packs those commit objects are in. I assume the commit graph information will be fairly modest. Once you have that extra information you can generate incremental packs whenever you upload to the server, and you can make the incremental packs per branch. That should allow an dumb fetcher to look at the list of commits and just fetch those packs it cares about, and since it only has to look one place first it should be fairly sane. The core idea is that if the dumb-server-preparation can anticipate common access patterns (mirror a branch) and give enough information so that can be done cheaply and pipelined I don't expect it to be much worse than an intelligent fetcher. The current intelligent fetch currently has a problem that it cannot be used to bootstrap a repository. If you don't have an ancestor of what you are fetching you can't fetch it. Eric ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] rev-list: add "--full-objects" flag. 2005-07-09 21:09 ` Eric W. Biederman @ 2005-07-10 5:11 ` Linus Torvalds 2005-07-10 6:28 ` Junio C Hamano 2005-07-10 21:48 ` Sven Verdoolaege 2005-07-10 22:36 ` Linus Torvalds 2 siblings, 1 reply; 66+ messages in thread From: Linus Torvalds @ 2005-07-10 5:11 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Junio C Hamano, git On Sat, 9 Jul 2005, Eric W. Biederman wrote: > > I assume the problem with the mirror it model was simply there were > to many objects? Yes. > > And then you really can use just rsync or wget or ncftpget or anything > > else that has a "fetch recursively, optimizing existing objects" mode. > > Sane. But with an intelligent fetcher and a little extra information a > dumb server should still be able to not fetch branches we care nothing > about. I think that extra information is simply commit object graph and > which packs those commit objects are in. I assume the commit graph > information will be fairly modest. Well, what I'd hope for is actually that eventually "webgit" will have some machine-parseable sub-tree, and then you can have this kind of thing generated automatically. But a _truly_ dumb server (ie one with no CGI at all, just "raw data", you really end up with just effectively rsyncing it. Yes, you could create a new "commit index file" every time you push, and maybe it's worth it, but on the other hand, what's wrong with just rsyncing it all and parsing it locally instead? People who use it for major development would all try to get the smart client, even if it's "just" some webgit extension thing.. Dumb servers work, they just won't do any selective stuff. Big deal. That's why they are dumb. Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] rev-list: add "--full-objects" flag. 2005-07-10 5:11 ` Linus Torvalds @ 2005-07-10 6:28 ` Junio C Hamano 0 siblings, 0 replies; 66+ messages in thread From: Junio C Hamano @ 2005-07-10 6:28 UTC (permalink / raw) To: Linus Torvalds; +Cc: Eric W. Biederman, git >>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes: >> On Sat, 9 Jul 2005, Eric W. Biederman wrote: >> I assume the commit graph information will be fairly modest. That is true. My experience from the one I have been cooking, Gitified 2.4.0->2.6.12-rc2 BKCVS export results in a bit shy of 600KB commit ancestry information. The full development trail for that repository contains 370152 objects among which 28237 are commits; when packed into one pack-idx pair, it is around a 170MB .pack with a 9MB .idx file. LT> But a _truly_ dumb server (ie one with no CGI at all, just "raw data", you LT> really end up with just effectively rsyncing it. Yes, you could create a LT> new "commit index file" every time you push, and maybe it's worth it, but LT> on the other hand, what's wrong with just rsyncing it all and parsing it LT> locally instead? Nothing, and you convinced me to drop the one I have been cooking. Maybe its time to either change git-fetch-script to use wget -r for http transport for objects part, perhaps? ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] rev-list: add "--full-objects" flag. 2005-07-09 21:09 ` Eric W. Biederman 2005-07-10 5:11 ` Linus Torvalds @ 2005-07-10 21:48 ` Sven Verdoolaege 2005-07-10 22:36 ` Linus Torvalds 2 siblings, 0 replies; 66+ messages in thread From: Sven Verdoolaege @ 2005-07-10 21:48 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Linus Torvalds, Junio C Hamano, git On Sat, Jul 09, 2005 at 03:09:02PM -0600, Eric W. Biederman wrote: > The current intelligent fetch currently has a problem that it cannot > be used to bootstrap a repository. If you don't have an ancestor > of what you are fetching you can't fetch it. > Not sure if this is what you want, but you could use the following gitweb patch (to be applied on top of my previous patches) to get a git tree snapshot for bootstrapping. http://www.liacs.nl/~sverdool/gitweb.cgi?p=gitweb.git;a=summary http://www.liacs.nl/~sverdool/gitweb.git/ skimo -- Support pack snapshots. --- commit f76a442a0e2166b3f17db0e496545a600a33f94c tree f8f089ab738864e69e0155b10262dbec832b4a11 parent 8392280de17a89a451c1f7db4e268f2047d4aa83 author Sven Verdoolaege <skimo@liacs.nl> Sun, 10 Jul 2005 23:56:42 +0200 committer Sven Verdoolaege <skimo@liacs.nl> Sun, 10 Jul 2005 23:56:42 +0200 gitweb.cgi | 11 ++++++++--- 1 files changed, 8 insertions(+), 3 deletions(-) diff --git a/gitweb.cgi b/gitweb.cgi --- a/gitweb.cgi +++ b/gitweb.cgi @@ -2058,8 +2058,9 @@ sub git_snapshot { "<th></th>\n" . "</tr>\n"; my %types = ( - 'Bzipped tar archive' => 'tar.bz2', - 'Gzipped tar archive' => 'tar.gz', + 'Source tree (bzipped tar archive)' => 'tar.bz2', + 'Source tree (gzipped tar archive)' => 'tar.gz', + 'Git tree (pack file)' => 'pack', ); my $alternate = 0; for my $type (sort keys %types) { @@ -2094,6 +2095,7 @@ sub git_serve_snapshot { my %info = ( 'tar.bz2' => [ 'application/x-bzip2', 'bzip2' ], 'tar.gz' => [ 'application/x-gzip', 'gzip' ], + 'pack' => [ 'application/x-git-pack' ], ); if (!exists $info{$st}) { die_error(undef, "Unknown snapshot type."); @@ -2101,7 +2103,10 @@ sub git_serve_snapshot { my ($type, $zip) = @{$info{$st}}; print $cgi->header(-type => $type, -attachment => "$project-$hash.$st"); - open my $fd, "-|", "$gitbin/git-tar-tree $hash '$project-$hash' | $zip" + open my $fd, "-|", ($st eq 'pack' ? + "$gitbin/git-rev-list --max-count=1 --objects $hash | ". + "$gitbin/git-pack-objects --stdout" : + "$gitbin/git-tar-tree $hash '$project-$hash' | $zip") or return; undef $/; print <$fd>; ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] rev-list: add "--full-objects" flag. 2005-07-09 21:09 ` Eric W. Biederman 2005-07-10 5:11 ` Linus Torvalds 2005-07-10 21:48 ` Sven Verdoolaege @ 2005-07-10 22:36 ` Linus Torvalds 2005-07-11 15:19 ` Eric W. Biederman 2 siblings, 1 reply; 66+ messages in thread From: Linus Torvalds @ 2005-07-10 22:36 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Junio C Hamano, git On Sat, 9 Jul 2005, Eric W. Biederman wrote: > > The current intelligent fetch currently has a problem that it cannot > be used to bootstrap a repository. If you don't have an ancestor > of what you are fetching you can't fetch it. Sure you can. See the current "git clone". It's actually quite good, it's a pleasure to use now that it gives updates on how much it has done. Just do git clone src dest to try it out. It starts out silent (for big repositories) because it takes a while to get the whole rev list, but once it gets going it's quite nice and gives a nice progress report.. It uses the exact same server side code that "git-fetch-pack" does (ie it just starts "git-upload-pack" on the server). Now, one thing you cannot do is to start a totally new _project_ on the server side. In order to do a "git-send-pack", you need to first create a directory and do a "git-init-db" on the remote side. So to create a new project, what you need to do is src$ ssh target target$ mkdir new-project target$ cd new-project target$ git-init-db target$ exit src$ git-send-pack target:new-project master and you've now sent your "master" branch to the new project at "target:new-project". You can even populate multiple branches at a time: just list them all (you do have to list them, because by default "git-send-pack" will update the _common_ branches, and since the other end is empty, there obviously are no common branches to start with). Ahh, you should even be able to automate the sending of all branches by doing git-send-pack target:new-project $(cd .git ; find refs -type f) I think - that will end up being equivalent to a "reverse clone". The smart clients are doing pretty damn well, I think. Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] rev-list: add "--full-objects" flag. 2005-07-10 22:36 ` Linus Torvalds @ 2005-07-11 15:19 ` Eric W. Biederman 2005-07-11 16:38 ` Linus Torvalds 2005-07-11 17:53 ` Linus Torvalds 0 siblings, 2 replies; 66+ messages in thread From: Eric W. Biederman @ 2005-07-11 15:19 UTC (permalink / raw) To: Linus Torvalds; +Cc: Junio C Hamano, git Linus Torvalds <torvalds@osdl.org> writes: > On Sat, 9 Jul 2005, Eric W. Biederman wrote: >> >> The current intelligent fetch currently has a problem that it cannot >> be used to bootstrap a repository. If you don't have an ancestor >> of what you are fetching you can't fetch it. > > Sure you can. > > See the current "git clone". It's actually quite good, it's a pleasure to > use now that it gives updates on how much it has done. > > Just do > > git clone src dest Sorry, somehow I just missed that, and then I noticed just a little before you sent out your email. I'm having the worst time putting together a mental model of how git works, and the documentation is spotty enough that it hasn't been helpful. So I am wading through the code. It seems every time I turn a corner there is another rough spot. I guess I was expecting to pull from one tree into another unrelated tree. Getting a tree with two heads and then be able to merge them together. A couple of questions. 1) Does git-clone-script when packed copy the entire repository or just take a couple of slices of the tree where you have references? 2) Is there a way for a pack to create deltas against objects that are not in the tree? For a dumb repository making incremental changes this is ideal. Eric ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] rev-list: add "--full-objects" flag. 2005-07-11 15:19 ` Eric W. Biederman @ 2005-07-11 16:38 ` Linus Torvalds 2005-07-12 0:44 ` Eric W. Biederman 2005-07-11 17:53 ` Linus Torvalds 1 sibling, 1 reply; 66+ messages in thread From: Linus Torvalds @ 2005-07-11 16:38 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Junio C Hamano, git On Mon, 11 Jul 2005, Eric W. Biederman wrote: > > I guess I was expecting to pull from one tree into another unrelated > tree. Getting a tree with two heads and then be able to merge them > together. You can do it, but you have to do it by hand. It's a valid operation, but it's not an operation I want people to do by mistake, so it's not something the trivial helper scripts help with. The way to do it by hand is to just use something stupid that doesn't understand what it's doing anyway, and just copy the files over. "cp -a" or "rsync" works fine. Then just do "git resolve" by hand. It's not very hard at all, but it's definitely something that should be a special case. > A couple of questions. > > 1) Does git-clone-script when packed copy the entire repository > or just take a couple of slices of the tree where you have > references? It only gets the objects needed for the references, nothing more. So if you only get one branch, it will leave the objects that are specific to other branches alone. > 2) Is there a way for a pack to create deltas against objects > that are not in the tree? For a dumb repository making incremental > changes this is ideal. A pack can only have deltas against objects in that pack. It caan't even have deltas to other objects in the same tree, it literally is only _within_ a pack. This is so that each pack is totally independent: you can always unpack (and verify) the objects in a pack _without_ having anything else (of course, the end result is often not a full project, and you won't have any references, but at least the _objects_ are valid). I don't want to have deltas to outside the pack, because while it's obviously very nice from a size packing standpoint, it's totally horrid from an infrastructure standpoint. It would make it possible to have circular dependencies (ie deltas against each other) that could only be resolved by having a third pack (or the unpacked object). It would also means that you may have to have two packs mapped at the same time to unpack them, which was very much against what I was aiming for: I think that in the long run, for truly huge projects, you'd want to have a history of packs, each maybe a gigabyte in size, and you may be in the situation that you simply cannot have two packs mapped at the same time because you don't have enough virtual memory for it. So then inter-pack deltas would mean that you'd have to have "partial pack mapping" etc horrid special case logic. Right now, because a pack is always self-sufficient, you know that in order to unpack an object, if you find it in the index file, you will be able to unpack it by just mapping that pack and going off.. So the rule is: don't pack too often. The unpacked objects are actually working really really well as long as you don't have tens of thousands of them. Having a few hundred (or even a few thousand) unpacked objects is not a problem at all. Then you do a "git repack" when it starts getting uncomfortable, and you you continue. Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] rev-list: add "--full-objects" flag. 2005-07-11 16:38 ` Linus Torvalds @ 2005-07-12 0:44 ` Eric W. Biederman 2005-07-12 1:14 ` Linus Torvalds 0 siblings, 1 reply; 66+ messages in thread From: Eric W. Biederman @ 2005-07-12 0:44 UTC (permalink / raw) To: Linus Torvalds; +Cc: Junio C Hamano, git Linus Torvalds <torvalds@osdl.org> writes: > On Mon, 11 Jul 2005, Eric W. Biederman wrote: >> >> I guess I was expecting to pull from one tree into another unrelated >> tree. Getting a tree with two heads and then be able to merge them >> together. > > You can do it, but you have to do it by hand. It's a valid operation, but > it's not an operation I want people to do by mistake, so it's not > something the trivial helper scripts help with. > > The way to do it by hand is to just use something stupid that doesn't > understand what it's doing anyway, and just copy the files over. "cp -a" > or "rsync" works fine. Then just do "git resolve" by hand. It's not very > hard at all, but it's definitely something that should be a special case. Ok. Only the dumb methods are allowed. >> A couple of questions. >> >> 1) Does git-clone-script when packed copy the entire repository >> or just take a couple of slices of the tree where you have >> references? > > It only gets the objects needed for the references, nothing more. > > So if you only get one branch, it will leave the objects that are specific > to other branches alone. Hmm. As I recall reading the code it grabs everything that is in .git/refs/*. So I would actually expect it to grab all of the branches. My real question was different. With a clone it appears to just get the objects used to compose a tree object, but none of the history available by looking at the commit parents is obtained. Not at all what I would expect for an operation named clone. Eric ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] rev-list: add "--full-objects" flag. 2005-07-12 0:44 ` Eric W. Biederman @ 2005-07-12 1:14 ` Linus Torvalds 2005-07-12 2:38 ` Eric W. Biederman 0 siblings, 1 reply; 66+ messages in thread From: Linus Torvalds @ 2005-07-12 1:14 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Junio C Hamano, git On Mon, 11 Jul 2005, Eric W. Biederman wrote: > > Ok. Only the dumb methods are allowed. Well, no, you can actually do git-clone-pack by hand in that git archive, and it will use the smart packing to get the other end, even if it is totally unrelated to the current project. But you have to do it by "hand" in the sense that none of the nice helper scripts will help you to do this. Merging two unrelated projects really is a very special operation. I've done it once (gitk into git), and I don't think we'll see it done very many times again. > > So if you only get one branch, it will leave the objects that are specific > > to other branches alone. > > Hmm. As I recall reading the code it grabs everything that is > in .git/refs/*. Only by default. If you specify a branch (or five) git-clone-pack will grab only that branch. However, I don't think "git clone" (the script) even exposes that, so right now you'd not even see it - "git clone" only exposes the "get all the branches by default" behaviour. Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] rev-list: add "--full-objects" flag. 2005-07-12 1:14 ` Linus Torvalds @ 2005-07-12 2:38 ` Eric W. Biederman 2005-07-12 3:21 ` Linus Torvalds 0 siblings, 1 reply; 66+ messages in thread From: Eric W. Biederman @ 2005-07-12 2:38 UTC (permalink / raw) To: Linus Torvalds; +Cc: Junio C Hamano, git Linus Torvalds <torvalds@osdl.org> writes: > On Mon, 11 Jul 2005, Eric W. Biederman wrote: >> > So if you only get one branch, it will leave the objects that are specific >> > to other branches alone. >> >> Hmm. As I recall reading the code it grabs everything that is >> in .git/refs/*. > > Only by default. > > If you specify a branch (or five) git-clone-pack will grab only that > branch. > > However, I don't think "git clone" (the script) even exposes that, so > right now you'd not even see it - "git clone" only exposes the "get all > the branches by default" behaviour. Yep. The question: Does git-upload-pack which gets it's list of objects with "git-rev-list --objects needed1 needed2 needed3 ^has1 ^has2 ^has3" get any history beyond the top of tree of each branch. As I read the code it does not. If the code does not get the history I see some problems. In particular merging with a branch is hard because we may not pull the common history point. Eric ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] rev-list: add "--full-objects" flag. 2005-07-12 2:38 ` Eric W. Biederman @ 2005-07-12 3:21 ` Linus Torvalds 2005-07-12 3:39 ` Eric W. Biederman 0 siblings, 1 reply; 66+ messages in thread From: Linus Torvalds @ 2005-07-12 3:21 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Junio C Hamano, git On Mon, 11 Jul 2005, Eric W. Biederman wrote: > > The question: > Does git-upload-pack which gets it's list of objects > with "git-rev-list --objects needed1 needed2 needed3 ^has1 ^has2 ^has3" > get any history beyond the top of tree of each branch. > > As I read the code it does not. It does. It gets all the history necessary for each branch. git-rev-list will walk the whole history until it hits commits that as been marked as uninteresting (or the parents of commits that have been marked as uninteresting), and those are the ones that the receiver already has, of course. So after you get a pack, you have all the history for all the branches you got. A branch you _didn't_ get, you don't get any history for, of course, but that doesn't matter. You'll get that history if you ever pull the branch later. Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] rev-list: add "--full-objects" flag. 2005-07-12 3:21 ` Linus Torvalds @ 2005-07-12 3:39 ` Eric W. Biederman 2005-07-12 4:48 ` Linus Torvalds 0 siblings, 1 reply; 66+ messages in thread From: Eric W. Biederman @ 2005-07-12 3:39 UTC (permalink / raw) To: Linus Torvalds; +Cc: Junio C Hamano, git Linus Torvalds <torvalds@osdl.org> writes: > On Mon, 11 Jul 2005, Eric W. Biederman wrote: >> >> The question: >> Does git-upload-pack which gets it's list of objects >> with "git-rev-list --objects needed1 needed2 needed3 ^has1 ^has2 ^has3" >> get any history beyond the top of tree of each branch. >> >> As I read the code it does not. > > It does. It gets all the history necessary for each branch. git-rev-list > will walk the whole history until it hits commits that as been marked as > uninteresting (or the parents of commits that have been marked as > uninteresting), and those are the ones that the receiver already has, of > course. Ok. So the intention is sane then. Looking closer it appears that commit_list_insert is recursive and that is what I missed. > So after you get a pack, you have all the history for all the branches you > got. > > A branch you _didn't_ get, you don't get any history for, of course, but > that doesn't matter. You'll get that history if you ever pull the branch > later. Right. Things work well if you have all of the history. Eric ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] rev-list: add "--full-objects" flag. 2005-07-12 3:39 ` Eric W. Biederman @ 2005-07-12 4:48 ` Linus Torvalds 0 siblings, 0 replies; 66+ messages in thread From: Linus Torvalds @ 2005-07-12 4:48 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Junio C Hamano, git On Mon, 11 Jul 2005, Eric W. Biederman wrote: > > Looking closer it appears that commit_list_insert is recursive > and that is what I missed. Actually, it's "pop_most_recent_commit()" that ends up being the "recursive" part: it will pop the top-most entry, but as it is popping it it will push the parents of that entry onto the same list. So basically, you can get a list of all history by first inserting the top entry, and then doing "pop_most_recent_commit()" until the list is empty. Now, git-rev-list ends up being slightly more complex than that, since it has support for multiple starting points, and marking commits (and thus their parents) uninteresting, and two other sorting methods in addition to the default "by date" thing. And then there's all the issues about tags, trees and blobs, and their visibility as a function of the commits that are visible and the command line arguments.. In fact, it turns out that git-rev-list is really the real heart of "git". Almost everything else revolves around it. Once you grok git-rev-list, you probably really grok git. Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH] rev-list: add "--full-objects" flag. 2005-07-11 15:19 ` Eric W. Biederman 2005-07-11 16:38 ` Linus Torvalds @ 2005-07-11 17:53 ` Linus Torvalds 1 sibling, 0 replies; 66+ messages in thread From: Linus Torvalds @ 2005-07-11 17:53 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Junio C Hamano, git On Mon, 11 Jul 2005, Eric W. Biederman wrote: > > I'm having the worst time putting together a mental model of how git > works, and the documentation is spotty enough that it hasn't been > helpful. So I am wading through the code. It seems every time I turn > a corner there is another rough spot. Btw, I know I'm bad at writing docs, but what I _do_ enjoy doing is answering reasonably specific technical questions, and maybe somebody else can write docs by taking advantage of me that way. I tried to write the tutorial in a way that it also tries to explain how git works (not just a "do this", but a "you update the index file and then write the result out as a tree object"), but it obviously covers a fairly limited part of what git actually can do, and at the same time it doesn't go into a lot of detail. And part of that is not just my inability to write documentation, it's also that I just have the wrong "view" of the project, ie I probably just take a lot of things for granted and consider them obvious, even though they aren't, and then I probably occasionally explain things that aren't worth explaining, because either they _are_ obvious, or people just don't care and they are irrelevant. I'd love to see somebody write up more of a "this is how you use git" kind of tutorial, _and_ on the other hand more of a low-level explanation of the notion of an object store where objects refer to each other by their SHA1 names, and how that is represented in the filesystem and/or in packs. Something with a few pictures would be great (ie screenshots of gitk, but also something that tries to just visually show hot tags point to commits that point to parents and trees, and trees pointing to other trees and then blobs). All things that I'm a complete idiot at, but that would help users visualize what the heck git is actually _doing_, so that they don't just parrot some magic command line that they don't understand, but can actually reason about what they are doing. I think a lot of people do understand this, but yes, the docs are kind of lacking. Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
[parent not found: <7vy88gzn6s.fsf@assigned-by-dhcp.cox.net>]
[parent not found: <Pine.LNX.4.58.0507082109140.17536@g5.osdl.org>]
[parent not found: <7vfyumj8hn.fsf_-_@assigned-by-dhcp.cox.net>]
* [PATCH] Check packs and then files. [not found] ` <7vfyumj8hn.fsf_-_@assigned-by-dhcp.cox.net> @ 2005-07-11 7:00 ` Junio C Hamano 0 siblings, 0 replies; 66+ messages in thread From: Junio C Hamano @ 2005-07-11 7:00 UTC (permalink / raw) To: Linus Torvalds; +Cc: git This reverses the order of object lookup, to check pack index first and then go to the filesystem to find .git/objects/??/ hierarchy. When most of the objects are packed, this saves quite many stat() calls and negative dcache entries; while the price this approach has to pay is negligible, even when most of the objects are outside pack, because checking pack index file is quite cheap. Signed-off-by: Junio C Hamano <junkio@cox.net> --- sha1_file.c | 9 ++++++--- 1 files changed, 6 insertions(+), 3 deletions(-) 0394e2b0ed5b197510340f187d02ef2274b6cad2 diff --git a/sha1_file.c b/sha1_file.c --- a/sha1_file.c +++ b/sha1_file.c @@ -1035,14 +1035,17 @@ void * read_sha1_file(const unsigned cha { unsigned long mapsize; void *map, *buf; + struct pack_entry e; + if (find_pack_entry(sha1, &e)) + return read_packed_sha1(sha1, type, size); map = map_sha1_file_internal(sha1, &mapsize); if (map) { buf = unpack_sha1_file(map, mapsize, type, size); munmap(map, mapsize); return buf; } - return read_packed_sha1(sha1, type, size); + return NULL; } void *read_object_with_reference(const unsigned char *sha1, @@ -1343,9 +1346,9 @@ int has_sha1_file(const unsigned char *s struct stat st; struct pack_entry e; - if (find_sha1_file(sha1, &st)) + if (find_pack_entry(sha1, &e)) return 1; - return find_pack_entry(sha1, &e); + return find_sha1_file(sha1, &st) ? 1 : 0; } int index_fd(unsigned char *sha1, int fd, struct stat *st, int write_object, const char *type) ^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH] Give --full-objects flag to rev-list when preparing a dumb server. 2005-07-07 23:58 ` Linus Torvalds 2005-07-08 1:02 ` [PATCH] rev-list: add "--full-objects" flag Junio C Hamano @ 2005-07-08 1:03 ` Junio C Hamano 1 sibling, 0 replies; 66+ messages in thread From: Junio C Hamano @ 2005-07-08 1:03 UTC (permalink / raw) To: Linus Torvalds; +Cc: git >>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes: LT> This is nasty - if you mis-spell "self-sufficient" (easy enough to do) LT> you'll never know the end result isn't what you expected. It won't warn LT> you in any way, it will just make a non-self-sufficient pack.. To match the change of flag name to --full-objects,... ------------ This adds --full flag to git-repack-script, and uses it when preparing the dumb server material. Signed-off-by: Junio C Hamano <junkio@cox.net> --- git-repack-script | 10 +++++++++- git-update-dumb-server-script | 2 +- 2 files changed, 10 insertions(+), 2 deletions(-) 0617ae867e7e27a7b484827f882fe7b396bea004 diff --git a/git-repack-script b/git-repack-script --- a/git-repack-script +++ b/git-repack-script @@ -1,8 +1,16 @@ #!/bin/sh : ${GIT_DIR=.git} : ${GIT_OBJECT_DIRECTORY="$GIT_DIR/objects"} + +case "$1" in +--full) + objects=--full-objects ;; +*) + objects=--objects ;; +esac + rm -f .tmp-pack-* -packname=$(git-rev-list --unpacked --objects $(git-rev-parse --all) | +packname=$(git-rev-list --unpacked $objects $(git-rev-parse --all) | git-pack-objects --non-empty --incremental .tmp-pack) || exit 1 if [ -z "$packname" ]; then diff --git a/git-update-dumb-server-script b/git-update-dumb-server-script --- a/git-update-dumb-server-script +++ b/git-update-dumb-server-script @@ -26,7 +26,7 @@ plain_size=$( if test $max_plain_size -lt $plain_size >/dev/null then - git-repack-script && git-prune-packed + git-repack-script --full && git-prune-packed fi && git-update-dumb-server && ------------ ^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH] Use --objects=self-sufficient flag to rev-list. 2005-07-07 23:16 ` [PATCH] Pull efficiently from a dumb git store Junio C Hamano 2005-07-07 23:50 ` [PATCH] rev-list: add "--objects=self-sufficient" flag Junio C Hamano @ 2005-07-07 23:50 ` Junio C Hamano 1 sibling, 0 replies; 66+ messages in thread From: Junio C Hamano @ 2005-07-07 23:50 UTC (permalink / raw) To: Linus Torvalds; +Cc: git This adds --self-sufficient flag to git-repack-script, and uses it when preparing the dumb server material. Signed-off-by: Junio C Hamano <junkio@cox.net> --- *** This makes things easier for the dumb puller because *** self-sufficient pack means less falling back on traditional *** http-pull. git-repack-script | 10 +++++++++- git-update-dumb-server-script | 2 +- 2 files changed, 10 insertions(+), 2 deletions(-) 6b0568181ede5540706bcdf69868102f554a2f8a diff --git a/git-repack-script b/git-repack-script --- a/git-repack-script +++ b/git-repack-script @@ -1,8 +1,16 @@ #!/bin/sh : ${GIT_DIR=.git} : ${GIT_OBJECT_DIRECTORY="$GIT_DIR/objects"} + +case "$1" in +--self-sufficient) + objects=--objects=self-sufficient ;; +*) + objects=--objects ;; +esac + rm -f .tmp-pack-* -packname=$(git-rev-list --unpacked --objects $(git-rev-parse --all) | +packname=$(git-rev-list --unpacked $objects $(git-rev-parse --all) | git-pack-objects --non-empty --incremental .tmp-pack) || exit 1 if [ -z "$packname" ]; then diff --git a/git-update-dumb-server-script b/git-update-dumb-server-script --- a/git-update-dumb-server-script +++ b/git-update-dumb-server-script @@ -26,7 +26,7 @@ plain_size=$( if test $max_plain_size -lt $plain_size >/dev/null then - git-repack-script && git-prune-packed + git-repack-script --self-sufficient && git-prune-packed fi && git-update-dumb-server && ------------ ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-07 22:52 ` Linus Torvalds 2005-07-07 23:16 ` [PATCH] Pull efficiently from a dumb git store Junio C Hamano @ 2005-07-07 23:52 ` Tony Luck 2005-07-07 23:54 ` Junio C Hamano 2005-07-07 23:59 ` Linus Torvalds 1 sibling, 2 replies; 66+ messages in thread From: Tony Luck @ 2005-07-07 23:52 UTC (permalink / raw) To: Linus Torvalds; +Cc: Petr Baudis, Junio C Hamano, git > > So, what _is_ then the way to pull now, actually? If we use rsync, won't > > we end up with having the objects we previous had twice now? > > Rsync works fine. You can either unpack the pack you get, or, if you > prefer, just run > > git-prune-packed cg-update from a local repo that contains packs is broken though :-( Also "git-fsck-cache" in a repo that is fully packed complains: fatal: No default references -Tony ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-07 23:52 ` [ANNOUNCE] Cogito-0.12 Tony Luck @ 2005-07-07 23:54 ` Junio C Hamano 2005-07-07 23:59 ` Linus Torvalds 1 sibling, 0 replies; 66+ messages in thread From: Junio C Hamano @ 2005-07-07 23:54 UTC (permalink / raw) To: Tony Luck; +Cc: git >>>>> "TL" == Tony Luck <tony.luck@gmail.com> writes: TL> Also "git-fsck-cache" in a repo that is fully packed complains: TL> fatal: No default references "git-fsck-cache --full", perhaps? ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-07 23:52 ` [ANNOUNCE] Cogito-0.12 Tony Luck 2005-07-07 23:54 ` Junio C Hamano @ 2005-07-07 23:59 ` Linus Torvalds 2005-07-08 0:09 ` Tony Luck 2005-07-08 0:09 ` Linus Torvalds 1 sibling, 2 replies; 66+ messages in thread From: Linus Torvalds @ 2005-07-07 23:59 UTC (permalink / raw) To: Tony Luck; +Cc: Petr Baudis, Junio C Hamano, git On Thu, 7 Jul 2005, Tony Luck wrote: > > > > So, what _is_ then the way to pull now, actually? If we use rsync, won't > > > we end up with having the objects we previous had twice now? > > > > Rsync works fine. You can either unpack the pack you get, or, if you > > prefer, just run > > > > git-prune-packed > > cg-update from a local repo that contains packs is broken though :-( Is this with cg-0.12? The most recent release should be happy with packs. > Also "git-fsck-cache" in a repo that is fully packed complains: > > fatal: No default references Ahh, that's true. I knew about it, and forgot. Will fix, Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-07 23:59 ` Linus Torvalds @ 2005-07-08 0:09 ` Tony Luck 2005-07-08 0:23 ` Linus Torvalds 2005-07-08 0:09 ` Linus Torvalds 1 sibling, 1 reply; 66+ messages in thread From: Tony Luck @ 2005-07-08 0:09 UTC (permalink / raw) To: Linus Torvalds; +Cc: Petr Baudis, Junio C Hamano, git > > cg-update from a local repo that contains packs is broken though :-( > > Is this with cg-0.12? The most recent release should be happy with packs. Yes ... I pulled, built and installed the latest cogito this afternoon before trying to touch anything involving packs. cg-version says: cogito-0.12 (b21855b8734ca76ea08c0c17e4a204191b6e3add) This is what happens ("linus" is a local branch just pulled from kernel.org, so it just contains one pack file and its index). $ cg-update linus `/home/aegl/GIT/linus/.git/refs/heads/master' -> `.git/refs/heads/linus' does not exist /home/aegl/GIT/linus/.git/objects/04/3d051615aa5da09a7e44f1edbb69 798458e067 Cannot obtain needed object 043d051615aa5da09a7e44f1edbb69798458e067 while processing commit 0000000000000000000000000000000000000000. cg-pull: objects pull failed If I try it again, it thinks things are up to date (since it mistakenly updated the .git/refs/heads/linus), but then fails to apply (since it doesn't have the objects it needs). -Tony ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-08 0:09 ` Tony Luck @ 2005-07-08 0:23 ` Linus Torvalds 2005-07-09 21:58 ` Russell King 0 siblings, 1 reply; 66+ messages in thread From: Linus Torvalds @ 2005-07-08 0:23 UTC (permalink / raw) To: Tony Luck; +Cc: Petr Baudis, Junio C Hamano, git On Thu, 7 Jul 2005, Tony Luck wrote: > > This is what happens ("linus" is a local branch just pulled from kernel.org, > so it just contains one pack file and its index). > > $ cg-update linus > `/home/aegl/GIT/linus/.git/refs/heads/master' -> `.git/refs/heads/linus' > does not exist /home/aegl/GIT/linus/.git/objects/04/3d051615aa5da09a7e44f1edbb69 > 798458e067 > Cannot obtain needed object 043d051615aa5da09a7e44f1edbb69798458e067 > while processing commit 0000000000000000000000000000000000000000. > cg-pull: objects pull failed Ok. The immediate fix is to just unpack the pack: mv .git/objects/pack/* .git/ for i in .git/*.pack; do git-unpack-objects < $i; done (or similar - the above is untested, but I think it should be obvious enough what I'm trying to do). > If I try it again, it thinks things are up to date (since it mistakenly > updated the .git/refs/heads/linus), but then fails to apply (since it > doesn't have the objects it needs). Ok, that's a worse bug, it really shouldn't update the head until _after_ the pull has succeeded. Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-08 0:23 ` Linus Torvalds @ 2005-07-09 21:58 ` Russell King 2005-07-09 22:29 ` Russell King 2005-07-10 8:09 ` Russell King 0 siblings, 2 replies; 66+ messages in thread From: Russell King @ 2005-07-09 21:58 UTC (permalink / raw) To: Linus Torvalds; +Cc: Tony Luck, Petr Baudis, Junio C Hamano, git On Thu, Jul 07, 2005 at 05:23:26PM -0700, Linus Torvalds wrote: > On Thu, 7 Jul 2005, Tony Luck wrote: > > This is what happens ("linus" is a local branch just pulled from kernel.org, > > so it just contains one pack file and its index). > > > > $ cg-update linus > > `/home/aegl/GIT/linus/.git/refs/heads/master' -> `.git/refs/heads/linus' > > does not exist /home/aegl/GIT/linus/.git/objects/04/3d051615aa5da09a7e44f1edbb69 > > 798458e067 > > Cannot obtain needed object 043d051615aa5da09a7e44f1edbb69798458e067 > > while processing commit 0000000000000000000000000000000000000000. > > cg-pull: objects pull failed > > Ok. The immediate fix is to just unpack the pack: > > mv .git/objects/pack/* .git/ > for i in .git/*.pack; do git-unpack-objects < $i; done > > (or similar - the above is untested, but I think it should be obvious > enough what I'm trying to do). This is evil on the bandwidth, since you'll keep refetching the packed object (64MB of it) over and over. However, I've tried the above, and I get: $ mv .git/objects/pack/* .git/ $ for i in .git/*.pack; do git-unpack-objects < $i; done Unpacking 55435 objects fatal: inflate returned -3 so it seems that the pack is corrupt... or something. $ md5sum .git/*.pack 2be38f2947b99bcd088c1930122aadec .git/pack-e3117bbaf6a59cb53c3f6f0d9b17b9433f0e4135.pack and git-fsck-cache produces lots and lots of: dangling tree fae688b62db0b553aae0bf17f0f70e93819dec2b broken link from tree faed7d798b84f107dbb9ff8fa97fb909c9ea5347 to blob 008e19210e66f01fbaef1aba30243850766b8b12 broken link from tree faed7d798b84f107dbb9ff8fa97fb909c9ea5347 to blob edae09a4b021e353ab4fbba756e31492fbb8fd2e broken link from tree faed7d798b84f107dbb9ff8fa97fb909c9ea5347 to blob d098b3ba35384fb912989348fd6da59820711ca4 ... etc ... -- Russell King ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-09 21:58 ` Russell King @ 2005-07-09 22:29 ` Russell King 2005-07-09 23:46 ` Junio C Hamano 2005-07-10 8:09 ` Russell King 1 sibling, 1 reply; 66+ messages in thread From: Russell King @ 2005-07-09 22:29 UTC (permalink / raw) To: Linus Torvalds; +Cc: Tony Luck, Petr Baudis, Junio C Hamano, git On Sat, Jul 09, 2005 at 10:58:18PM +0100, Russell King wrote: > On Thu, Jul 07, 2005 at 05:23:26PM -0700, Linus Torvalds wrote: > > On Thu, 7 Jul 2005, Tony Luck wrote: > > > This is what happens ("linus" is a local branch just pulled from kernel.org, > > > so it just contains one pack file and its index). > > > > > > $ cg-update linus > > > `/home/aegl/GIT/linus/.git/refs/heads/master' -> `.git/refs/heads/linus' > > > does not exist /home/aegl/GIT/linus/.git/objects/04/3d051615aa5da09a7e44f1edbb69 > > > 798458e067 > > > Cannot obtain needed object 043d051615aa5da09a7e44f1edbb69798458e067 > > > while processing commit 0000000000000000000000000000000000000000. > > > cg-pull: objects pull failed > > > > Ok. The immediate fix is to just unpack the pack: > > > > mv .git/objects/pack/* .git/ > > for i in .git/*.pack; do git-unpack-objects < $i; done > > > > (or similar - the above is untested, but I think it should be obvious > > enough what I'm trying to do). > > This is evil on the bandwidth, since you'll keep refetching the packed > object (64MB of it) over and over. > > However, I've tried the above, and I get: > > $ mv .git/objects/pack/* .git/ > $ for i in .git/*.pack; do git-unpack-objects < $i; done > Unpacking 55435 objects > fatal: inflate returned -3 > > so it seems that the pack is corrupt... or something. > > $ md5sum .git/*.pack > 2be38f2947b99bcd088c1930122aadec .git/pack-e3117bbaf6a59cb53c3f6f0d9b17b9433f0e4135.pack > > and git-fsck-cache produces lots and lots of: > > dangling tree fae688b62db0b553aae0bf17f0f70e93819dec2b > broken link from tree faed7d798b84f107dbb9ff8fa97fb909c9ea5347 > to blob 008e19210e66f01fbaef1aba30243850766b8b12 > broken link from tree faed7d798b84f107dbb9ff8fa97fb909c9ea5347 > to blob edae09a4b021e353ab4fbba756e31492fbb8fd2e > broken link from tree faed7d798b84f107dbb9ff8fa97fb909c9ea5347 > to blob d098b3ba35384fb912989348fd6da59820711ca4 > ... etc ... Additional information: x86 box, running FC2, cogito 0.12 built from the src.rpm on kernel.org. Lots of disk space (blocks + inodes) remaining. Pretty please can we stop breaking rmk's git/cogito/repos/scripts ? -- Russell King ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-09 22:29 ` Russell King @ 2005-07-09 23:46 ` Junio C Hamano 2005-07-10 5:02 ` Linus Torvalds 0 siblings, 1 reply; 66+ messages in thread From: Junio C Hamano @ 2005-07-09 23:46 UTC (permalink / raw) To: Russell King; +Cc: Linus Torvalds, Tony Luck, Petr Baudis, git >>>>> "RK" == Russell King <rmk@arm.linux.org.uk> writes: >> $ mv .git/objects/pack/* .git/ >> $ for i in .git/*.pack; do git-unpack-objects < $i; done >> Unpacking 55435 objects >> fatal: inflate returned -3 >> >> so it seems that the pack is corrupt... or something. >> >> $ md5sum .git/*.pack >> 2be38f2947b99bcd088c1930122aadec .git/pack-e3117bbaf6a59cb53c3f6f0d9b17b9433f0e4135.pack RK> Additional information: x86 box, running FC2, cogito 0.12 built from RK> the src.rpm on kernel.org. Lots of disk space (blocks + inodes) RK> remaining. Hmph, I am worried about that inflate() failure. An x86 box, running Debian sarge, vanilla git without Cogito built from Linus tip. From here, it does not look like the pack corruption to me; unless you broke md5sum and found a collission, that is. : siamese; type git-unpack-objects git-unpack-objects is /home/junio/bin/Linux/git-unpack-objects : siamese; ldd /home/junio/bin/Linux/git-unpack-objects libz.so.1 => /usr/lib/libz.so.1 (0xb7f8e000) libcrypto.so.0.9.7 => /usr/lib/i686/cmov/libcrypto.so.0.9.7 (0xb7e8e000) libc.so.6 => /lib/tls/libc.so.6 (0xb7d59000) libdl.so.2 => /lib/tls/libdl.so.2 (0xb7d56000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0xb7fad000) : siamese; cd /opt/packrat/playpen/public/in-place/git/linux-2.6/ : siamese; md5sum .git/objects/pack/pack-*.pack 2be38f2947b99bcd088c1930122aadec .git/objects/pack/pack-e3117bbaf6a59cb53c3f6f0d9b17b9433f0e4135.pack : siamese; cd .. : siamese; mkdir junk : siamese; cd junk : siamese; git-init-db defaulting to local storage area : siamese; git-unpack-objects <../linux-2.6/.git/objects/pack/pack-e3117bbaf6a59cb53c3f6f0d9b17b9433f0e4135.pack Unpacking 55435 objects 100% (55434/55435) done ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-09 23:46 ` Junio C Hamano @ 2005-07-10 5:02 ` Linus Torvalds 2005-07-10 5:15 ` Linus Torvalds 0 siblings, 1 reply; 66+ messages in thread From: Linus Torvalds @ 2005-07-10 5:02 UTC (permalink / raw) To: Junio C Hamano; +Cc: Russell King, Tony Luck, Petr Baudis, git On Sat, 9 Jul 2005, Junio C Hamano wrote: > > >>>>> "RK" == Russell King <rmk@arm.linux.org.uk> writes: > > >> $ mv .git/objects/pack/* .git/ > >> $ for i in .git/*.pack; do git-unpack-objects < $i; done > >> Unpacking 55435 objects > >> fatal: inflate returned -3 Ahh, damn. > >> so it seems that the pack is corrupt... or something. No, I htink you're using cogito-0.12, and I fixed this one-liner that didn't make it into cogito: diff-tree 291ec0f2d2ce65e5ccb876b46d6468af49ddb82e (from 72347a233e6f3c176059a28f0817de6654ef29c7) Author: Linus Torvalds <torvalds@g5.osdl.org> Date: Tue Jul 5 17:06:09 2005 -0700 Don't special-case a zero-sized compression. zlib actually writes a header for that case, and while ignoring that header will get us the right data, it will also end up messing up our stream position. So we actually want zlib to "uncompress" even an empty object. diff --git a/unpack-objects.c b/unpack-objects.c --- a/unpack-objects.c +++ b/unpack-objects.c @@ -55,8 +55,6 @@ static void *get_data(unsigned long size z_stream stream; void *buf = xmalloc(size); - if (!size) - return buf; memset(&stream, 0, sizeof(stream)); stream.next_out = buf; (well, I guess it's a two-liner.). What happens is that there's one zero-sized blob in the kernel archive history, and when we pack it, we pack it as a 8-byte "compressed" thing (hey, zlib has a header, that's normal), but when we unpack it, because we notice that the result is zero, we'd just skip the zlib header. Which was wrong, because now the _next_ object will try to unpack at the wrong offset, and that explains why you get -3 ("bad data"). Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-10 5:02 ` Linus Torvalds @ 2005-07-10 5:15 ` Linus Torvalds 2005-07-10 6:55 ` Russell King 0 siblings, 1 reply; 66+ messages in thread From: Linus Torvalds @ 2005-07-10 5:15 UTC (permalink / raw) To: Junio C Hamano; +Cc: Russell King, Tony Luck, Petr Baudis, git On Sat, 9 Jul 2005, Linus Torvalds wrote: > > No, I htink you're using cogito-0.12, and I fixed this one-liner that > didn't make it into cogito: Btw, this will only affect unpacking. The packed objects should be fine, and you'll never see this if you keep the index file around and have the pack in .git/objects/pack, because then git won't ever do the "streaming" thing, it will look up exactly where the object is using the index, and it doesn't matter that it doesn't look at the compressed data of a zero-sized object. So cogito isn't terminally broken, it just can't do the streaming unpack. And as Russell points out, unpacking the packs after downloading them is actually the wrong thing to do, because you break the rsync'ness of your archive, so you'll keep on downloading the pack-files over and over again. So you can fix this by getting the current git release, but you probably shouldn't even care. Just use the pack-files as pack-files instead, and enjoy the higher performance and lower disk use ;). Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-10 5:15 ` Linus Torvalds @ 2005-07-10 6:55 ` Russell King 2005-07-10 7:15 ` Junio C Hamano 0 siblings, 1 reply; 66+ messages in thread From: Russell King @ 2005-07-10 6:55 UTC (permalink / raw) To: Linus Torvalds; +Cc: Junio C Hamano, Tony Luck, Petr Baudis, git On Sat, Jul 09, 2005 at 10:15:41PM -0700, Linus Torvalds wrote: > So you can fix this by getting the current git release, but you probably > shouldn't even care. Just use the pack-files as pack-files instead, and > enjoy the higher performance and lower disk use ;). I would if I could, but my workflow involves having an untouched local copy of your tree and several trees for each area. This involves updates using relative paths, and as has already been found elsewhere, this (with cogito 0.12) doesn't work with packed objects yet. -- Russell King ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-10 6:55 ` Russell King @ 2005-07-10 7:15 ` Junio C Hamano 2005-07-10 12:46 ` Russell King 0 siblings, 1 reply; 66+ messages in thread From: Junio C Hamano @ 2005-07-10 7:15 UTC (permalink / raw) To: Russell King; +Cc: Linus Torvalds, Tony Luck, Petr Baudis, git >>>>> "RK" == Russell King <rmk@arm.linux.org.uk> writes: RK> I would if I could, but my workflow involves having an untouched local RK> copy of your tree and several trees for each area. RK> This involves updates using relative paths, and as has already been RK> found elsewhere, this (with cogito 0.12) doesn't work with packed RK> objects yet. As a workaround until Cogito gets updated, would it help to have the environment variable GIT_ALTERNATE_OBJECT_DIRECTORIES pointing at the untouched copy of Linus tree's .git/objects/ directory? All your other trees would find the objects in your copied-Linus tree (including packed one) available to them already and hopefully pull breakage does not even have to touch those objects. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-10 7:15 ` Junio C Hamano @ 2005-07-10 12:46 ` Russell King 2005-07-10 16:51 ` Linus Torvalds 0 siblings, 1 reply; 66+ messages in thread From: Russell King @ 2005-07-10 12:46 UTC (permalink / raw) To: Junio C Hamano; +Cc: Linus Torvalds, Tony Luck, Petr Baudis, git On Sun, Jul 10, 2005 at 12:15:48AM -0700, Junio C Hamano wrote: > As a workaround until Cogito gets updated, would it help to have > the environment variable GIT_ALTERNATE_OBJECT_DIRECTORIES > pointing at the untouched copy of Linus tree's .git/objects/ > directory? All your other trees would find the objects in your > copied-Linus tree (including packed one) available to them > already and hopefully pull breakage does not even have to touch > those objects. That seems to work, thanks. I think this is a good idea anyway - it seems to mean that each working tree ends up with an empty set of .git/objects/* directories. When new work is done in a tree, the corresponding objects then appear, and only these objects need transferring upstream. It means that rsync --delete-after can (in theory) be used when making changes available to the upstream maintainer. -- Russell King ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-10 12:46 ` Russell King @ 2005-07-10 16:51 ` Linus Torvalds 2005-07-10 19:15 ` Russell King 0 siblings, 1 reply; 66+ messages in thread From: Linus Torvalds @ 2005-07-10 16:51 UTC (permalink / raw) To: Russell King; +Cc: Junio C Hamano, Tony Luck, Petr Baudis, git On Sun, 10 Jul 2005, Russell King wrote: > > It means that rsync --delete-after can (in theory) be used when > making changes available to the upstream maintainer. I'd suggest against that from a safety standpoint (no backups), but what you _can_ do is to upload only the objects I don't have. This actually works - I already synced several weeks ago with Paul Mackerras, who had made his ppc64 git thing contain only the objects that I didn't have. In other words, if you have my tree pointed to by GIT_ALTERNATE_OBJECT_DIRECTORIES, and you populate your tree only with new files, you can actually upload that small "sparsely populated" tree as-is (without any of the objects that came from my tree), and I should be able to pull it as-is. Well, at least with rsync. I think my git "pack" send/receive thing might be unhappy about a partial tree, but that's something I can fix, so if this makes it easier for people (you can create a totally new tre _really_ cheaply and also upload it and move it around very cheaply), then I'm ok with pulling from partial repositories, and I have indeed already done so in the past. Btw, if people start doing this, then I really think we want a ".git/config" file, so that you can have different alternate object directories for different git directories without having to remember to set the environment variables all the time. Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-10 16:51 ` Linus Torvalds @ 2005-07-10 19:15 ` Russell King 2005-07-10 20:03 ` Linus Torvalds 0 siblings, 1 reply; 66+ messages in thread From: Russell King @ 2005-07-10 19:15 UTC (permalink / raw) To: Linus Torvalds; +Cc: Junio C Hamano, Petr Baudis, git On Sun, Jul 10, 2005 at 09:51:16AM -0700, Linus Torvalds wrote: > On Sun, 10 Jul 2005, Russell King wrote: > > It means that rsync --delete-after can (in theory) be used when > > making changes available to the upstream maintainer. > > I'd suggest against that from a safety standpoint (no backups), but what > you _can_ do is to upload only the objects I don't have. > > This actually works - I already synced several weeks ago with Paul > Mackerras, who had made his ppc64 git thing contain only the objects that > I didn't have. Ok, let's give this a go then. However, I'm not confident in this working, especially after seeing the output of git-fsck-cache --full... and I've no idea _why_ it's complaining. I've pushed this (partial) tree out to master.kernel.org:~rmk/linux-2.6-arm.git Below is the usual mail. $ export | grep GIT_ declare -x GIT_ALTERNATE_OBJECT_DIRECTORIES="/home/rmk/git/linux-2.6/.git/objects" $ git-fsck-cache --full error: cannot read sha1_file for 0084227438c28d26bc2d089b1facc4675310f741 bad sha1 entry '0084227438c28d26bc2d089b1facc4675310f741' error: cannot read sha1_file for 008c1ddc1fc2854b64fcb49a40f1c933d116fb5c bad sha1 entry '008c1ddc1fc2854b64fcb49a40f1c933d116fb5c' ... error: cannot read sha1_file for 83c28d2c90fe720b5a315b89301cf3a519ffed88 bad sha1 entry '83c28d2c90fe720b5a315b89301cf3a519ffed88' dangling commit 043d051615aa5da09a7e44f1edbb69798458e067 dangling commit a92b7b80579fe68fe229892815c750f6652eb6a9 $ grep . .git/refs/heads/* .git/refs/heads/master:ec6bced6c7b92904f5ead39c9c1b8dc734e6eff0 .git/refs/heads/origin:f179bc77d09b9087bfc559d0368bba350342ac76 .git/refs/heads/smp:053a7b5b7617a72d7c61b6f84196d1c0f79b9849 $ cd $GIT_ALTERNATE_OBJECT_DIRECTORIES/../.. $ git-fsck-cache --full $ Could this be because cogito doesn't know how to handle this setup properly yet? Have I just destroyed my git tree by trying to apply stuff to it? --- Linus, Andrew, Please incorporate the latest ARM changes, which can be found at: master.kernel.org:/home/rmk/linux-2.6-arm.git This will update the following files: arch/arm/mach-omap/Kconfig | 221 ----- arch/arm/mach-omap/Makefile | 40 arch/arm/mach-omap/Makefile.boot | 4 arch/arm/mach-omap/board-generic.c | 100 -- arch/arm/mach-omap/board-h2.c | 189 ---- arch/arm/mach-omap/board-h3.c | 207 ----- arch/arm/mach-omap/board-innovator.c | 282 ------ arch/arm/mach-omap/board-netstar.c | 153 --- arch/arm/mach-omap/board-osk.c | 171 ---- arch/arm/mach-omap/board-perseus2.c | 191 ---- arch/arm/mach-omap/board-voiceblue.c | 258 ------ arch/arm/mach-omap/clock.c | 1076 -------------------------- arch/arm/mach-omap/clock.h | 112 -- arch/arm/mach-omap/common.c | 549 ------------- arch/arm/mach-omap/common.h | 36 arch/arm/mach-omap/dma.c | 1086 -------------------------- arch/arm/mach-omap/fpga.c | 188 ---- arch/arm/mach-omap/gpio.c | 762 ------------------ arch/arm/mach-omap/irq.c | 219 ----- arch/arm/mach-omap/leds-h2p2-debug.c | 144 --- arch/arm/mach-omap/leds-innovator.c | 103 -- arch/arm/mach-omap/leds-osk.c | 198 ---- arch/arm/mach-omap/leds.c | 61 - arch/arm/mach-omap/leds.h | 3 arch/arm/mach-omap/mcbsp.c | 685 ---------------- arch/arm/mach-omap/mux.c | 163 --- arch/arm/mach-omap/ocpi.c | 114 -- arch/arm/mach-omap/pm.c | 632 --------------- arch/arm/mach-omap/sleep.S | 314 ------- arch/arm/mach-omap/time.c | 424 ---------- arch/arm/mach-omap/usb.c | 593 -------------- arch/arm/Kconfig | 6 arch/arm/Makefile | 6 arch/arm/configs/enp2611_defconfig | 20 arch/arm/configs/ixdp2400_defconfig | 20 arch/arm/configs/ixdp2401_defconfig | 20 arch/arm/configs/ixdp2800_defconfig | 20 arch/arm/configs/ixdp2801_defconfig | 20 arch/arm/configs/omap_h2_1610_defconfig | 117 +- arch/arm/mach-ixp2000/core.c | 55 - arch/arm/mach-ixp2000/enp2611.c | 1 arch/arm/mach-ixp2000/ixdp2x00.c | 1 arch/arm/mach-ixp2000/ixdp2x01.c | 1 arch/arm/mach-omap1/Kconfig | 144 +++ arch/arm/mach-omap1/Makefile | 30 arch/arm/mach-omap1/Makefile.boot | 3 arch/arm/mach-omap1/board-generic.c | 99 ++ arch/arm/mach-omap1/board-h2.c | 188 ++++ arch/arm/mach-omap1/board-h3.c | 206 ++++ arch/arm/mach-omap1/board-innovator.c | 281 ++++++ arch/arm/mach-omap1/board-netstar.c | 152 +++ arch/arm/mach-omap1/board-osk.c | 170 ++++ arch/arm/mach-omap1/board-perseus2.c | 190 ++++ arch/arm/mach-omap1/board-voiceblue.c | 257 ++++++ arch/arm/mach-omap1/fpga.c | 188 ++++ arch/arm/mach-omap1/id.c | 188 ++++ arch/arm/mach-omap1/io.c | 115 ++ arch/arm/mach-omap1/irq.c | 234 +++++ arch/arm/mach-omap1/leds-h2p2-debug.c | 144 +++ arch/arm/mach-omap1/leds-innovator.c | 103 ++ arch/arm/mach-omap1/leds-osk.c | 194 ++++ arch/arm/mach-omap1/leds.c | 61 + arch/arm/mach-omap1/leds.h | 3 arch/arm/mach-omap1/serial.c | 200 ++++ arch/arm/mach-omap1/time.c | 436 ++++++++++ arch/arm/mm/Kconfig | 2 arch/arm/mm/mm-armv.c | 4 arch/arm/plat-omap/Kconfig | 112 ++ arch/arm/plat-omap/Makefile | 17 arch/arm/plat-omap/clock.c | 1323 ++++++++++++++++++++++++++++++++ arch/arm/plat-omap/clock.h | 120 ++ arch/arm/plat-omap/common.c | 135 +++ arch/arm/plat-omap/cpu-omap.c | 128 +++ arch/arm/plat-omap/dma.c | 1116 ++++++++++++++++++++++++++ arch/arm/plat-omap/gpio.c | 762 ++++++++++++++++++ arch/arm/plat-omap/mcbsp.c | 758 ++++++++++++++++++ arch/arm/plat-omap/mux.c | 160 +++ arch/arm/plat-omap/ocpi.c | 114 ++ arch/arm/plat-omap/pm.c | 632 +++++++++++++++ arch/arm/plat-omap/sleep.S | 314 +++++++ arch/arm/plat-omap/usb.c | 593 ++++++++++++++ include/asm-arm/arch-ixp2000/platform.h | 1 include/asm-arm/arch-omap/board-h2.h | 5 include/asm-arm/arch-omap/board-h3.h | 5 include/asm-arm/arch-omap/board-osk.h | 5 include/asm-arm/arch-omap/board.h | 12 include/asm-arm/arch-omap/common.h | 36 include/asm-arm/arch-omap/dma.h | 1 include/asm-arm/arch-omap/hardware.h | 24 include/asm-arm/arch-omap/irqs.h | 3 include/asm-arm/arch-omap/mux.h | 28 include/asm-arm/arch-omap/omap16xx.h | 32 include/asm-arm/arch-omap/system.h | 21 93 files changed, 10164 insertions(+), 9450 deletions(-) through these changes: From: Tony Lindgren: Sun Jul 10 19:58:20 BST 2005 [PATCH] ARM: 2803/1: OMAP update 11/11: Add cpufreq support Patch from Tony Lindgren This patch adds minimal cpufreq support for OMAP taking advantage of the clock framework. Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> From: Tony Lindgren: Sun Jul 10 19:58:19 BST 2005 [PATCH] ARM: 2805/1: OMAP update 10/11: Update H2 defconfig Patch from Tony Lindgren This patch updates H2 defconfig. Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> From: Tony Lindgren: Sun Jul 10 19:58:18 BST 2005 [PATCH] ARM: 2804/1: OMAP update 9/11: Update OMAP arch files Patch from Tony Lindgren This patch by various OMAP developers syncs the OMAP specific arch files with the linux-omap tree. Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> From: Tony Lindgren: Sun Jul 10 19:58:17 BST 2005 [PATCH] ARM: 2802/1: OMAP update 8/11: Update OMAP arch files Patch from Tony Lindgren This patch by various OMAP developers syncs the OMAP specific arch files with the linux-omap tree. Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> From: Tony Lindgren: Sun Jul 10 19:58:15 BST 2005 [PATCH] ARM: 2812/1: OMAP update 7c/11: Move arch-omap to plat-omap Patch from Tony Lindgren This patch move common OMAP code from arch-omap to plat-omap directory. Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> From: Tony Lindgren: Sun Jul 10 19:58:14 BST 2005 [PATCH] ARM: 2809/1: OMAP update 7b/11: Move arch-omap to plat-omap Patch from Tony Lindgren This patch move common OMAP code from arch-omap to plat-omap directory. Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> From: Tony Lindgren: Sun Jul 10 19:58:13 BST 2005 [PATCH] ARM: 2807/1: OMAP update 7a/11: Move arch-omap to plat-omap Patch from Tony Lindgren This patch move common OMAP code from arch-omap to plat-omap directory. Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> From: Tony Lindgren: Sun Jul 10 19:58:12 BST 2005 [PATCH] ARM: 2801/1: OMAP update 6/11: Split OMAP1 common code into id, io and serial Patch from Tony Lindgren This patch by Juha Yrjölä and other OMAP developers splits OMAP1 specific common code into OMAP1 id, io, and serial code in mach-omap1 directory. Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> From: Tony Lindgren: Sun Jul 10 19:58:11 BST 2005 [PATCH] ARM: 2806/1: OMAP update 5/11: Move board files into mach-omap1 directory Patch from Tony Lindgren This patch by Paul Mundt and other OMAP developers moves OMAP1 board files into mach-omap1 directory. Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> From: Tony Lindgren: Sun Jul 10 19:58:10 BST 2005 [PATCH] ARM: 2799/1: OMAP update 4/11: Move OMAP1 LED code into mach-omap1 directory Patch from Tony Lindgren This patch by Paul Mundt and other OMAP developers moves OMAP1 specific LED code into mach-omap1 directory. Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> From: Tony Lindgren: Sun Jul 10 19:58:09 BST 2005 [PATCH] ARM: 2800/1: OMAP update 3/11: Move OMAP1 core code into mach-omap1 directory Patch from Tony Lindgren This patch by Paul Mundt and other OMAP developers moves OMAP1 specific IRQ, time, and FPGA code into mach-omap1 directory. Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> From: Tony Lindgren: Sun Jul 10 19:58:08 BST 2005 [PATCH] ARM: 2798/1: OMAP update 2/11: Change ARM Kconfig to support omap1 and omap2 Patch from Tony Lindgren This patch by Paul Mundt and other OMAP developers modifies ARM specific Kconfig to allow sharing code between OMAP1 and OMAP2 architectures. In order to share code between OMAP1 and OMAP2, all OMAP1 specific code is moved into mach-omap1 directory in the following patch. A new mach-omap2 directory will be added later on. Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> From: Tony Lindgren: Sun Jul 10 19:58:06 BST 2005 [PATCH] ARM: 2797/1: OMAP update 1/11: Update include files Patch from Tony Lindgren This patch by various OMAP developers syncs the OMAP specific include files with the linux-omap tree. Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> From: Deepak Saxena: Sun Jul 10 19:44:55 BST 2005 [PATCH] ARM: 2796/1: Fix ARMv5[TEJ] check in MMU initalization Patch from Deepak Saxena The code in mm-armv.c checks for the condition (cpu_architecture()<= ARMv5) in a few places but should be checking for ARMv5TEJ as the MMU is shared across all v5 variations. Signed-off-by: Deepak Saxena <dsaxena@plexity.net> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> From: Lennert Buytenhek: Sun Jul 10 19:44:54 BST 2005 [PATCH] ARM: 2795/1: update ixp2000 defconfigs Patch from Lennert Buytenhek Update the ixp2000 defconfigs from 2.6.12-git6 to 2.6.13-rc2. Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> From: Lennert Buytenhek: Sun Jul 10 19:44:53 BST 2005 [PATCH] ARM: 2793/1: platform serial support for ixp2000 Patch from Lennert Buytenhek This patch converts the ixp2000 serial port over to a platform serial device. Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Deepak Saxena <dsaxena@plexity.net> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> -- Russell King ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-10 19:15 ` Russell King @ 2005-07-10 20:03 ` Linus Torvalds 2005-07-10 20:32 ` Russell King 0 siblings, 1 reply; 66+ messages in thread From: Linus Torvalds @ 2005-07-10 20:03 UTC (permalink / raw) To: Russell King; +Cc: Junio C Hamano, Petr Baudis, git On Sun, 10 Jul 2005, Russell King wrote: > > Ok, let's give this a go then. However, I'm not confident in this > working, especially after seeing the output of git-fsck-cache --full... > and I've no idea _why_ it's complaining. Ok, I've downloaded your objects, and it all looks fine. Nothing is missing. So something is wrong with the git-fsck-cache handling of GIT_ALTERNATE_OBJECT_DIRECTORIES, but I don't see what. Other programs happily see the objects, git-fsck-cache for some reason does not, and thus complains. I'll try to figure it out. However, the more I try to make "git-pack-objects" work with a partial repository, the less happy I am about it. It works wonderfully well with rsync:, since rsync just doesn't know that something is missing, but generating the object list when there are objects missing is quite hard. I can be trivial and say "missing objects aren't interesting", and it would _work_, but that just doesn't make me happy. So I'm almost getting ready to say "let's not do this thing after all". > Could this be because cogito doesn't know how to handle this setup > properly yet? Have I just destroyed my git tree by trying to apply > stuff to it? This is definitely not a cogito problem, that fsck thing is in git itself. And no, you didn't destroy your tree - I just merged it, and the merged results look fine and fsck correctly (and I get the same diffstat you do). It's just a bug in fsck somewhere that makes it look bad. That said, my inability to check the pack for completeness for a partial archive makes me think this partial rsync wasn't such a good idea after all. It _is_ convenient, though, so I'll have to think about the send-pack issues some more and see if I can resolve the difficulty without too much problems. And clearly I need to fix git-fsck-cache. Anyway, I pushed out the merge, so don't worry about your tree. But let's hold off on this partial thing for a while, ok? Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-10 20:03 ` Linus Torvalds @ 2005-07-10 20:32 ` Russell King 2005-07-10 21:40 ` Linus Torvalds 0 siblings, 1 reply; 66+ messages in thread From: Russell King @ 2005-07-10 20:32 UTC (permalink / raw) To: Linus Torvalds; +Cc: Junio C Hamano, Petr Baudis, git On Sun, Jul 10, 2005 at 01:03:30PM -0700, Linus Torvalds wrote: > Anyway, I pushed out the merge, so don't worry about your tree. But let's > hold off on this partial thing for a while, ok? Thanks, that's good news. I was fearing having to reconstruct stuff. Do you want me to re-populate linux-2.6-arm.git to be fully populated or are you happy for it to just grow the new objects as they become available? -- Russell King ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-10 20:32 ` Russell King @ 2005-07-10 21:40 ` Linus Torvalds 0 siblings, 0 replies; 66+ messages in thread From: Linus Torvalds @ 2005-07-10 21:40 UTC (permalink / raw) To: Russell King; +Cc: Junio C Hamano, Petr Baudis, git On Sun, 10 Jul 2005, Russell King wrote: > > On Sun, Jul 10, 2005 at 01:03:30PM -0700, Linus Torvalds wrote: > > Anyway, I pushed out the merge, so don't worry about your tree. But let's > > hold off on this partial thing for a while, ok? > > Thanks, that's good news. I was fearing having to reconstruct stuff. > > Do you want me to re-populate linux-2.6-arm.git to be fully populated > or are you happy for it to just grow the new objects as they become > available? We can try just letting it grow. That way I'll have more reason to try to make the partial-repo thing just work. Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-09 21:58 ` Russell King 2005-07-09 22:29 ` Russell King @ 2005-07-10 8:09 ` Russell King 2005-07-10 14:59 ` Petr Baudis 1 sibling, 1 reply; 66+ messages in thread From: Russell King @ 2005-07-10 8:09 UTC (permalink / raw) To: Petr Baudis; +Cc: git On Sat, Jul 09, 2005 at 10:58:18PM +0100, Russell King wrote: > $ mv .git/objects/pack/* .git/ > $ for i in .git/*.pack; do git-unpack-objects < $i; done > Unpacking 55435 objects > fatal: inflate returned -3 This morning's cg-update gave these new errors: receiving file list ... done wrote 86 bytes read 192 bytes 556.00 bytes/sec total size is 410 speedup is 1.47 Missing object of tag v2.6.11... different source (obsolete tag?) Missing object of tag v2.6.11-tree... different source (obsolete tag?) Missing object of tag v2.6.12... different source (obsolete tag?) Missing object of tag v2.6.12-rc2... different source (obsolete tag?) Missing object of tag v2.6.12-rc3... different source (obsolete tag?) Missing object of tag v2.6.12-rc4... different source (obsolete tag?) Missing object of tag v2.6.12-rc5... different source (obsolete tag?) Missing object of tag v2.6.12-rc6... different source (obsolete tag?) Missing object of tag v2.6.13-rc1... different source (obsolete tag?) Missing object of tag v2.6.13-rc2... different source (obsolete tag?) -- Russell King ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-10 8:09 ` Russell King @ 2005-07-10 14:59 ` Petr Baudis 2005-07-11 20:30 ` Chris Wright 0 siblings, 1 reply; 66+ messages in thread From: Petr Baudis @ 2005-07-10 14:59 UTC (permalink / raw) To: Russell King; +Cc: git Dear diary, on Sun, Jul 10, 2005 at 10:09:14AM CEST, I got a letter where Russell King <rmk@arm.linux.org.uk> told me that... > On Sat, Jul 09, 2005 at 10:58:18PM +0100, Russell King wrote: > > $ mv .git/objects/pack/* .git/ > > $ for i in .git/*.pack; do git-unpack-objects < $i; done > > Unpacking 55435 objects > > fatal: inflate returned -3 > > This morning's cg-update gave these new errors: > > receiving file list ... done > > wrote 86 bytes read 192 bytes 556.00 bytes/sec > total size is 410 speedup is 1.47 > Missing object of tag v2.6.11... different source (obsolete tag?) > Missing object of tag v2.6.11-tree... different source (obsolete tag?) > Missing object of tag v2.6.12... different source (obsolete tag?) > Missing object of tag v2.6.12-rc2... different source (obsolete tag?) > Missing object of tag v2.6.12-rc3... different source (obsolete tag?) > Missing object of tag v2.6.12-rc4... different source (obsolete tag?) > Missing object of tag v2.6.12-rc5... different source (obsolete tag?) > Missing object of tag v2.6.12-rc6... different source (obsolete tag?) > Missing object of tag v2.6.13-rc1... different source (obsolete tag?) > Missing object of tag v2.6.13-rc2... different source (obsolete tag?) Ok, cg-pull didn't quite handle this. I've fixed it so that it should reasonably handle it now. Hopefully. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ <Espy> be careful, some twit might quote you out of context.. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-10 14:59 ` Petr Baudis @ 2005-07-11 20:30 ` Chris Wright 0 siblings, 0 replies; 66+ messages in thread From: Chris Wright @ 2005-07-11 20:30 UTC (permalink / raw) To: Petr Baudis; +Cc: Russell King, git * Petr Baudis (pasky@suse.cz) wrote: > Ok, cg-pull didn't quite handle this. I've fixed it so that it should > reasonably handle it now. Hopefully. Is this plus the zero-sized fix worth making cogito-0.12-2 rpm release? IOW, these two patches... diff-tree 291ec0f2d2ce65e5ccb876b46d6468af49ddb82e (from 72347a233e6f3c176059a28f0817de6654ef29c7) tree a1d3a4e01516f1d924c407a9e42a6df0d13b43b6 parent 72347a233e6f3c176059a28f0817de6654ef29c7 author Linus Torvalds <torvalds@g5.osdl.org> 1120608369 -0700 committer Linus Torvalds <torvalds@g5.osdl.org> 1120608369 -0700 Don't special-case a zero-sized compression. zlib actually writes a header for that case, and while ignoring that header will get us the right data, it will also end up messing up our stream position. So we actually want zlib to "uncompress" even an empty object. diff --git a/unpack-objects.c b/unpack-objects.c --- a/unpack-objects.c +++ b/unpack-objects.c @@ -55,8 +55,6 @@ static void *get_data(unsigned long size z_stream stream; void *buf = xmalloc(size); - if (!size) - return buf; memset(&stream, 0, sizeof(stream)); stream.next_out = buf; diff-tree 7b754d7f0800117cd97afa5e806e50c7fd16d8c1 (from a2503fd85e6bb7f25d134a5634a1d8efc93fee5f) Author: Petr Baudis <pasky@suse.cz> Date: Sun Jul 10 16:59:28 2005 +0200 Fix cg-pull to handle packed tags properly If the objects referenced by refs/tags/ are packed, it wouldn't detect them properly and instead try to refetch them, but they are likely to be packed on the other side as well and that makes them impossible to be fetched explicitly (which isn't a problem as long as they are the same branch). Also, the fetch failure message was confusing. Reported by Russel King. diff --git a/cg-pull b/cg-pull --- a/cg-pull +++ b/cg-pull @@ -294,13 +294,14 @@ $fetch -i -s -u -d "$uri/refs/tags" "$_g for tag in *; do [ "$tag" = "*" ] && break tagid=$(cat $tag) - tagfile=objects/${tagid:0:2}/${tagid:2} - [ -s "../../$tagfile" ] && continue + GIT_DIR=../../../$_git git-cat-file -t "$tagid" >/dev/null 2>&1 && continue echo -n "Missing object of tag $tag... " + # In case it's not in a packfile... + tagfile=objects/${tagid:0:2}/${tagid:2} if $fetch -i -s "$uri/$tagfile" "../../$tagfile" 2>/dev/null >&2; then echo "retrieved" else - echo "different source (obsolete tag?)" + echo "unable to retrieve" fi done ) ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-07 23:59 ` Linus Torvalds 2005-07-08 0:09 ` Tony Luck @ 2005-07-08 0:09 ` Linus Torvalds 2005-07-08 8:14 ` Petr Baudis 1 sibling, 1 reply; 66+ messages in thread From: Linus Torvalds @ 2005-07-08 0:09 UTC (permalink / raw) To: Tony Luck; +Cc: Petr Baudis, Junio C Hamano, git On Thu, 7 Jul 2005, Linus Torvalds wrote: > > > > cg-update from a local repo that contains packs is broken though :-( > > Is this with cg-0.12? The most recent release should be happy with packs. Ahh, I see it. It's because it uses "git-local-pull", and yes, git-local-pull does the old filename assumption. Right? Ho humm.. That's a bug in local-pull.c, although I'm not sure how to fix it best. One option is to just not use it (as in "use git-fetch-pack instead"), and another is to use GIT_ALTERNATE_OBJECT_DIRECTORIES and just pick up the files that way. Yet another one is to actually copy over (or link) the pack-file, but that's likely the least preferable one. The _simplest_ fix is to use git-fetch-pack. It doesn't give you the convenient hard-linking, though. Linus ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-08 0:09 ` Linus Torvalds @ 2005-07-08 8:14 ` Petr Baudis 2005-07-08 15:56 ` Daniel Barkalow 0 siblings, 1 reply; 66+ messages in thread From: Petr Baudis @ 2005-07-08 8:14 UTC (permalink / raw) To: Linus Torvalds; +Cc: Tony Luck, Junio C Hamano, Daniel Barkalow, git Dear diary, on Fri, Jul 08, 2005 at 02:09:48AM CEST, I got a letter where Linus Torvalds <torvalds@osdl.org> told me that... > > > On Thu, 7 Jul 2005, Linus Torvalds wrote: > > > > > > cg-update from a local repo that contains packs is broken though :-( > > > > Is this with cg-0.12? The most recent release should be happy with packs. > > Ahh, I see it. It's because it uses "git-local-pull", and yes, > git-local-pull does the old filename assumption. Right? > > Ho humm.. That's a bug in local-pull.c, although I'm not sure how to fix > it best. It seems like the whole pull family is totally borked now, and I'm getting desperate. Looks like this evening will be *pull.c fixing for me. Jul 04 Daniel Barkalow [PATCH 0/2] Support for transferring pack files in git-ssh-* is what brings some hope to my life, though. Daniel? Any chance we could get the similar fixes for local-pull? (I didn't actually look at the patch but briefly.) I'll try to review the ssh patchset ASAP - I still prefer it much to the fetch-pack things since its protocol is actually extensible. > The _simplest_ fix is to use git-fetch-pack. It doesn't give you the > convenient hard-linking, though. Hard-linking is an absolute must for local repositories (well, either that for people who want safety, or symlinking for the rest who want speed - I want to make that one possible in Cogito ASAP but it requires some non-trivial changes to some of its assumptions). -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ <Espy> be careful, some twit might quote you out of context.. ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-08 8:14 ` Petr Baudis @ 2005-07-08 15:56 ` Daniel Barkalow 0 siblings, 0 replies; 66+ messages in thread From: Daniel Barkalow @ 2005-07-08 15:56 UTC (permalink / raw) To: Petr Baudis; +Cc: Linus Torvalds, Tony Luck, Junio C Hamano, git On Fri, 8 Jul 2005, Petr Baudis wrote: > It seems like the whole pull family is totally borked now, and I'm > getting desperate. Looks like this evening will be *pull.c fixing for > me. > > Jul 04 Daniel Barkalow [PATCH 0/2] Support for transferring pack files in git-ssh-* > > is what brings some hope to my life, though. Daniel? Any chance we could > get the similar fixes for local-pull? (I didn't actually look at the > patch but briefly.) This patch is not actually for transferring objects which are in pack files in the source, but for transferring a group of objects as a pack file. It does, however, read the source side with git-pack-objects to generate the content to send, so it would, I guess, fix the problem for the case where it decides to use a pack to transfer. The real fix is to go through the pull methods (local-pull and ssh-pull; http-pull presumably won't be encountering pack files yet) and make them do appropriate things with pack files. One thing that is in the patch is a change to the comment, specifying that fetch() could also get other objects in addition to the one specified, if there's some reason to think this is a good idea; the fix for local-pull is probably to link/symlink/copy the pack file if the object is in one. For ssh-pull, serve_object in ssh-push needs to be taught how to extract an object from a pack file and send it. However, there's a bug in pull.c, covering up a terrible performance issue: it doesn't actually make sure you have all the parent of a commit that you had when it checked (due to not having a way of caching the result of checking this, which would require you to put the entire repository through cache each time you pull). This would mean that, if you have a pack that references something outside of it, you won't get everything with my proposal above. I should be able to spend some time on these issues over the weekend. -Daniel *This .sig left intentionally blank* ^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [ANNOUNCE] Cogito-0.12 2005-07-03 23:46 [ANNOUNCE] Cogito-0.12 Petr Baudis 2005-07-06 12:01 ` Brian Gerst @ 2005-07-07 6:22 ` Chris Wright 1 sibling, 0 replies; 66+ messages in thread From: Chris Wright @ 2005-07-07 6:22 UTC (permalink / raw) To: Petr Baudis; +Cc: git * Petr Baudis (pasky@suse.cz) wrote: > I'm happy to announce the release of the 0.12 version of the Cogito > SCM-like layer over Linus' GIT tree history storage tool. Get it at > > http://www.kernel.org/pub/software/scm/cogito/ RPMs uploading to: http://www.kernel.org/pub/software/scm/cogito/RPMS thanks, -chris ^ permalink raw reply [flat|nested] 66+ messages in thread
end of thread, other threads:[~2005-07-12 4:48 UTC | newest]
Thread overview: 66+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-03 23:46 [ANNOUNCE] Cogito-0.12 Petr Baudis
2005-07-06 12:01 ` Brian Gerst
2005-07-07 14:45 ` Petr Baudis
2005-07-07 17:21 ` Junio C Hamano
2005-07-07 19:04 ` Linus Torvalds
2005-07-07 19:57 ` Junio C Hamano
2005-07-07 21:58 ` Linus Torvalds
2005-07-07 22:10 ` Junio C Hamano
2005-07-07 20:00 ` Junio C Hamano
2005-07-07 21:29 ` Eric W. Biederman
2005-07-07 22:23 ` Linus Torvalds
2005-07-08 2:11 ` Eric W. Biederman
2005-07-08 1:54 ` Dumb servers (was: [ANNOUNCE] Cogito-0.12) Kevin Smith
2005-07-08 2:27 ` Linus Torvalds
2005-07-07 22:14 ` [ANNOUNCE] Cogito-0.12 Petr Baudis
2005-07-07 22:52 ` Linus Torvalds
2005-07-07 23:16 ` [PATCH] Pull efficiently from a dumb git store Junio C Hamano
2005-07-07 23:50 ` [PATCH] rev-list: add "--objects=self-sufficient" flag Junio C Hamano
2005-07-07 23:58 ` Linus Torvalds
2005-07-08 1:02 ` [PATCH] rev-list: add "--full-objects" flag Junio C Hamano
2005-07-08 1:33 ` Linus Torvalds
2005-07-08 1:46 ` Linus Torvalds
2005-07-08 2:17 ` Junio C Hamano
2005-07-08 2:39 ` Linus Torvalds
2005-07-09 21:09 ` Eric W. Biederman
2005-07-10 5:11 ` Linus Torvalds
2005-07-10 6:28 ` Junio C Hamano
2005-07-10 21:48 ` Sven Verdoolaege
2005-07-10 22:36 ` Linus Torvalds
2005-07-11 15:19 ` Eric W. Biederman
2005-07-11 16:38 ` Linus Torvalds
2005-07-12 0:44 ` Eric W. Biederman
2005-07-12 1:14 ` Linus Torvalds
2005-07-12 2:38 ` Eric W. Biederman
2005-07-12 3:21 ` Linus Torvalds
2005-07-12 3:39 ` Eric W. Biederman
2005-07-12 4:48 ` Linus Torvalds
2005-07-11 17:53 ` Linus Torvalds
[not found] ` <7vy88gzn6s.fsf@assigned-by-dhcp.cox.net>
[not found] ` <Pine.LNX.4.58.0507082109140.17536@g5.osdl.org>
[not found] ` <7vfyumj8hn.fsf_-_@assigned-by-dhcp.cox.net>
2005-07-11 7:00 ` [PATCH] Check packs and then files Junio C Hamano
2005-07-08 1:03 ` [PATCH] Give --full-objects flag to rev-list when preparing a dumb server Junio C Hamano
2005-07-07 23:50 ` [PATCH] Use --objects=self-sufficient flag to rev-list Junio C Hamano
2005-07-07 23:52 ` [ANNOUNCE] Cogito-0.12 Tony Luck
2005-07-07 23:54 ` Junio C Hamano
2005-07-07 23:59 ` Linus Torvalds
2005-07-08 0:09 ` Tony Luck
2005-07-08 0:23 ` Linus Torvalds
2005-07-09 21:58 ` Russell King
2005-07-09 22:29 ` Russell King
2005-07-09 23:46 ` Junio C Hamano
2005-07-10 5:02 ` Linus Torvalds
2005-07-10 5:15 ` Linus Torvalds
2005-07-10 6:55 ` Russell King
2005-07-10 7:15 ` Junio C Hamano
2005-07-10 12:46 ` Russell King
2005-07-10 16:51 ` Linus Torvalds
2005-07-10 19:15 ` Russell King
2005-07-10 20:03 ` Linus Torvalds
2005-07-10 20:32 ` Russell King
2005-07-10 21:40 ` Linus Torvalds
2005-07-10 8:09 ` Russell King
2005-07-10 14:59 ` Petr Baudis
2005-07-11 20:30 ` Chris Wright
2005-07-08 0:09 ` Linus Torvalds
2005-07-08 8:14 ` Petr Baudis
2005-07-08 15:56 ` Daniel Barkalow
2005-07-07 6:22 ` Chris Wright
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).