* git-daemon on NSLU2 @ 2007-08-24 5:54 Jon Smirl 2007-08-24 6:21 ` Shawn O. Pearce 0 siblings, 1 reply; 30+ messages in thread From: Jon Smirl @ 2007-08-24 5:54 UTC (permalink / raw) To: Git Mailing List Any ideas on why git protocol clone is failing? 2007-08-24_20:51:33.85649 [9758] Connection from 72.74.92.181:19367 2007-08-24_20:51:33.85828 [9758] Extended attributes (33 bytes) exist <host=git.jonsmirl.is-a-geek.net> 2007-08-24_20:51:33.96990 [9758] Request upload-pack for '/home/git/mpc5200b.git' 2007-08-24_20:51:45.00789 fatal: Out of memory? mmap failed: Cannot allocate memory 2007-08-24_20:51:45.08746 error: git-upload-pack: git-rev-list died with error. 2007-08-24_20:51:45.08771 fatal: git-upload-pack: aborting due to possible repository corruption on the remote side. NSLU2 ($70) is 266Mhz ARM with 32MB memory. It's running Debian on a 250GB disk with 180MB swap. Watching top the process runs up to about 60MB in virtual size and exits. Setting the window down made no difference packedGitWindowSize = 4194304 -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-daemon on NSLU2 2007-08-24 5:54 git-daemon on NSLU2 Jon Smirl @ 2007-08-24 6:21 ` Shawn O. Pearce 2007-08-24 19:38 ` Jon Smirl 0 siblings, 1 reply; 30+ messages in thread From: Shawn O. Pearce @ 2007-08-24 6:21 UTC (permalink / raw) To: Jon Smirl; +Cc: Git Mailing List Jon Smirl <jonsmirl@gmail.com> wrote: > Any ideas on why git protocol clone is failing? > > 2007-08-24_20:51:33.85649 [9758] Connection from 72.74.92.181:19367 > 2007-08-24_20:51:33.85828 [9758] Extended attributes (33 bytes) exist > <host=git.jonsmirl.is-a-geek.net> > 2007-08-24_20:51:33.96990 [9758] Request upload-pack for > '/home/git/mpc5200b.git' > 2007-08-24_20:51:45.00789 fatal: Out of memory? mmap failed: Cannot > allocate memory > 2007-08-24_20:51:45.08746 error: git-upload-pack: git-rev-list died with error. > 2007-08-24_20:51:45.08771 fatal: git-upload-pack: aborting due to > possible repository corruption on the remote side. > > NSLU2 ($70) is 266Mhz ARM with 32MB memory. > It's running Debian on a 250GB disk with 180MB swap. > > Watching top the process runs up to about 60MB in virtual size and exits. > Setting the window down made no difference packedGitWindowSize = 4194304 ulimits? packedGitLimit may also need to be decreased? Though we always try to free unused windows before we declare we are out of memory... -- Shawn. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-daemon on NSLU2 2007-08-24 6:21 ` Shawn O. Pearce @ 2007-08-24 19:38 ` Jon Smirl 2007-08-24 20:23 ` Nicolas Pitre 2007-08-24 20:27 ` Jon Smirl 0 siblings, 2 replies; 30+ messages in thread From: Jon Smirl @ 2007-08-24 19:38 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Git Mailing List I'm still trying to debug git-daemon I do find it surprising that git-index-pack can't be happy with in 20MB of RAM and it has to continuously swap it's 30MB of virtual. My disk is chattering itself to death. It stayed that way for 40 minutes. I'm practicing on the kernel tree. -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-daemon on NSLU2 2007-08-24 19:38 ` Jon Smirl @ 2007-08-24 20:23 ` Nicolas Pitre 2007-08-24 21:17 ` Jon Smirl 2007-08-24 20:27 ` Jon Smirl 1 sibling, 1 reply; 30+ messages in thread From: Nicolas Pitre @ 2007-08-24 20:23 UTC (permalink / raw) To: Jon Smirl; +Cc: Shawn O. Pearce, Git Mailing List On Fri, 24 Aug 2007, Jon Smirl wrote: > I'm still trying to debug git-daemon > > I do find it surprising that git-index-pack can't be happy with in > 20MB of RAM and it has to continuously swap it's 30MB of virtual. My > disk is chattering itself to death. It stayed that way for 40 minutes. > > I'm practicing on the kernel tree. You hope for miracles, do you? ;-) Please stop hammering that poor little NSLU2 with such a workset, or hack some additional 224MB of RAM into it. There is no magical solution. Nicolas ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-daemon on NSLU2 2007-08-24 20:23 ` Nicolas Pitre @ 2007-08-24 21:17 ` Jon Smirl 2007-08-24 21:54 ` Nicolas Pitre ` (2 more replies) 0 siblings, 3 replies; 30+ messages in thread From: Jon Smirl @ 2007-08-24 21:17 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Shawn O. Pearce, Git Mailing List On 8/24/07, Nicolas Pitre <nico@cam.org> wrote: > On Fri, 24 Aug 2007, Jon Smirl wrote: > > > I'm still trying to debug git-daemon > > > > I do find it surprising that git-index-pack can't be happy with in > > 20MB of RAM and it has to continuously swap it's 30MB of virtual. My > > disk is chattering itself to death. It stayed that way for 40 minutes. > > > > I'm practicing on the kernel tree. > > You hope for miracles, do you? ;-) We're going something wrong in git-daemon. I can clone the tree in five minutes using the http protocol. Using the git protocol would take 24hrs if I let it finish. > Please stop hammering that poor little NSLU2 with such a workset, or > hack some additional 224MB of RAM into it. There is no magical > solution. > > > Nicolas > -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-daemon on NSLU2 2007-08-24 21:17 ` Jon Smirl @ 2007-08-24 21:54 ` Nicolas Pitre 2007-08-24 22:06 ` Jon Smirl 2007-08-24 23:28 ` Linus Torvalds 2 siblings, 0 replies; 30+ messages in thread From: Nicolas Pitre @ 2007-08-24 21:54 UTC (permalink / raw) To: Jon Smirl; +Cc: Shawn O. Pearce, Git Mailing List On Fri, 24 Aug 2007, Jon Smirl wrote: > On 8/24/07, Nicolas Pitre <nico@cam.org> wrote: > > On Fri, 24 Aug 2007, Jon Smirl wrote: > > > > > I'm still trying to debug git-daemon > > > > > > I do find it surprising that git-index-pack can't be happy with in > > > 20MB of RAM and it has to continuously swap it's 30MB of virtual. My > > > disk is chattering itself to death. It stayed that way for 40 minutes. > > > > > > I'm practicing on the kernel tree. > > > > You hope for miracles, do you? ;-) > > We're going something wrong in git-daemon. I can clone the tree in > five minutes using the http protocol. Using the git protocol would > take 24hrs if I let it finish. The http protocol is merely only a dumb file copy with no packing optimization what so ever. The native protocol performs a whole more to provide clients with only the minimum data needed. Try running "git repack -a" directly on the NSLU2. You should have the same performance problems as with a clone. Nicolas ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-daemon on NSLU2 2007-08-24 21:17 ` Jon Smirl 2007-08-24 21:54 ` Nicolas Pitre @ 2007-08-24 22:06 ` Jon Smirl 2007-08-24 22:39 ` Jakub Narebski 2007-08-25 0:10 ` Nicolas Pitre 2007-08-24 23:28 ` Linus Torvalds 2 siblings, 2 replies; 30+ messages in thread From: Jon Smirl @ 2007-08-24 22:06 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Shawn O. Pearce, Git Mailing List On 8/24/07, Jon Smirl <jonsmirl@gmail.com> wrote: > We're going something wrong in git-daemon. I can clone the tree in > five minutes using the http protocol. Using the git protocol would > take 24hrs if I let it finish. 20Mb/s to kernel.org time git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git real 2m34.629s 20Mb/s to kernel.org time git clone http://www.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git real 3m52.203s Same kernel from my NSLU2 over http (100Mb/s) time git clone http://jonsmirl.is-a-geek.net/apache2-default/mpc.git real 2m36.227s Using git protocol to nslu2 takes 24hrs On 8/24/07, Nicolas Pitre <nico@cam.org> wrote: > Try running "git repack -a" directly on the NSLU2. You should have the > same performance problems as with a clone. This is true, it would take over 24hrs to finish. Is their a reason why initial clone hasn't been special cased? Why can't initial clone just blast over the pack file already sitting on the disk? I also wonder if a little application of some sorting to in-memory data structures could help with the random IO patterns. I'm getting the same data out of a stupid HTTP server and it doesn't go all IO bound on me so a solution has to be possible. -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-daemon on NSLU2 2007-08-24 22:06 ` Jon Smirl @ 2007-08-24 22:39 ` Jakub Narebski 2007-08-24 22:59 ` Junio C Hamano 2007-08-24 23:46 ` Jon Smirl 2007-08-25 0:10 ` Nicolas Pitre 1 sibling, 2 replies; 30+ messages in thread From: Jakub Narebski @ 2007-08-24 22:39 UTC (permalink / raw) To: git Jon Smirl wrote: > On 8/24/07, Nicolas Pitre <nico@cam.org> wrote: >> Try running "git repack -a" directly on the NSLU2. You should have the >> same performance problems as with a clone. > > This is true, it would take over 24hrs to finish. > > Is their a reason why initial clone hasn't been special cased? Why > can't initial clone just blast over the pack file already sitting on > the disk? There was idea to special case clone (just concatenate the packs, the receiving side as someone told there can detect pack boundaries; do not forget to pack loose objects, first), instead of using generic fetch --all for clone, bnut no code. Code speaks louder than words (although if someone would provide details of pack boundary detection...) -- Jakub Narebski Warsaw, Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-daemon on NSLU2 2007-08-24 22:39 ` Jakub Narebski @ 2007-08-24 22:59 ` Junio C Hamano 2007-08-24 23:21 ` Jakub Narebski 2007-08-24 23:46 ` Jon Smirl 1 sibling, 1 reply; 30+ messages in thread From: Junio C Hamano @ 2007-08-24 22:59 UTC (permalink / raw) To: Jakub Narebski; +Cc: git Jakub Narebski <jnareb@gmail.com> writes: > There was idea to special case clone (just concatenate the packs, the > receiving side as someone told there can detect pack boundaries; do not > forget to pack loose objects, first), instead of using generic fetch --all > for clone, bnut no code. Code speaks louder than words (although if someone > would provide details of pack boundary detection...) I have to say that "although ..." part of that statement disqualifies this to be called an "idea". Really, I find that you (yes, in this case I am not generalizing but talking specifically about you) tend to overuse the word "idea" when you talk things that are not yet even at that stage yet. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-daemon on NSLU2 2007-08-24 22:59 ` Junio C Hamano @ 2007-08-24 23:21 ` Jakub Narebski 0 siblings, 0 replies; 30+ messages in thread From: Jakub Narebski @ 2007-08-24 23:21 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Junio C Hamano wrote: > Jakub Narebski <jnareb@gmail.com> writes: > >> There was idea to special case clone (just concatenate the packs, the >> receiving side as someone told there can detect pack boundaries; do not >> forget to pack loose objects, first), instead of using generic fetch --all >> for clone, bnut no code. Code speaks louder than words (although if someone >> would provide details of pack boundary detection...) > > I have to say that "although ..." part of that statement > disqualifies this to be called an "idea". Ermm... if I remember correctly during discussion (single subthread) there were provided details, or at least idea, of how to separate concatented packs into individual packs. Unfortunately I haven't saved the message, and do not remember enogh of it to search archives... I should have wrote "remind" instead of "provide" there... > Really, I find that you (yes, in this case I am not generalizing > but talking specifically about you) tend to overuse the word > "idea" when you talk things that are not yet even at that stage > yet. I'm not native English speaker... ;-) Seriously, it's a fault of mine... -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-daemon on NSLU2 2007-08-24 22:39 ` Jakub Narebski 2007-08-24 22:59 ` Junio C Hamano @ 2007-08-24 23:46 ` Jon Smirl 2007-08-25 0:04 ` Junio C Hamano 1 sibling, 1 reply; 30+ messages in thread From: Jon Smirl @ 2007-08-24 23:46 UTC (permalink / raw) To: Jakub Narebski; +Cc: git On 8/24/07, Jakub Narebski <jnareb@gmail.com> wrote: > There was idea to special case clone (just concatenate the packs, the > receiving side as someone told there can detect pack boundaries; do not > forget to pack loose objects, first), instead of using generic fetch --all > for clone, bnut no code. Code speaks louder than words (although if someone > would provide details of pack boundary detection...) A related concept, initial clone of a repository does the equivalent of repack -a on the repo before transmitting it. Why aren't we saving those results by switching the repo onto the new pack file? Then the next clone that comes along won't have to do anything but send the file. But this logic can be flipped around, if the remote needs any object from the pack file, just send them the whole pack file and let the remote sort it out. Using this logic you can still minimize the IO statistically. When a remote does a fetch you have to pack all of the loose objects. When the loose object pile reaches 20MB or so, the fetch can trigger a repack of the oldest half into a pack that is kept by the tree and replaces those older loose objects. For future fetches simply apply the rule of sending the whole pack if any object is needed. The repack of the 10MB of older objects can be kicked out to another process and copied into the tree when it is finished. At that point the loose objects can be deleted. The git db can tolerate a process copying in a new packfile and deleting the old objects while other processes may be using the database, right? This model shouldn't statistically change the amount of data very much. If you haven't synced your tree in a month a few too many objects may get sent to you. However, it should dramatically reduce the IO load on the server cause by git protocol initial clones. -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-daemon on NSLU2 2007-08-24 23:46 ` Jon Smirl @ 2007-08-25 0:04 ` Junio C Hamano 2007-08-25 7:12 ` David Kastrup 2007-08-25 17:02 ` Salikh Zakirov 0 siblings, 2 replies; 30+ messages in thread From: Junio C Hamano @ 2007-08-25 0:04 UTC (permalink / raw) To: Jon Smirl; +Cc: Jakub Narebski, git "Jon Smirl" <jonsmirl@gmail.com> writes: > On 8/24/07, Jakub Narebski <jnareb@gmail.com> wrote: >> There was idea to special case clone (just concatenate the packs, the >> receiving side as someone told there can detect pack boundaries; do not >> forget to pack loose objects, first), instead of using generic fetch --all >> for clone, bnut no code. Code speaks louder than words (although if someone >> would provide details of pack boundary detection...) > > A related concept, initial clone of a repository does the equivalent > of repack -a on the repo before transmitting it. Why aren't we saving > those results by switching the repo onto the new pack file? Then the > next clone that comes along won't have to do anything but send the > file. If the majority of the access to your repository is the initial clone request, then it might be a worthwhile thing to do. In fact didn't we use to have such a "pre-prepared pack" support? But I do not think "majority is initial clone" is the norm. Even among the people who does an "initial clone" (from the end-user perspective), what they do may not be the initial full clone your special hack helps (and that was one of the reasons we dropped the pre-prepared pack support --- "been there, done that" to some extent). - If your client "clone"s only a single branch by doing: $ git init $ git remote add origin $remote_url $ git pull origin master the set of objects you need to send would be different (slightly smaller) than the normal clone. - Another example would be a client that uses --reference: $ git clone --reference neigh.git git://yourbox/repo.git which would give you a request that is different from the usual initial full clone request. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-daemon on NSLU2 2007-08-25 0:04 ` Junio C Hamano @ 2007-08-25 7:12 ` David Kastrup 2007-08-25 17:02 ` Salikh Zakirov 1 sibling, 0 replies; 30+ messages in thread From: David Kastrup @ 2007-08-25 7:12 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jon Smirl, Jakub Narebski, git Junio C Hamano <gitster@pobox.com> writes: > "Jon Smirl" <jonsmirl@gmail.com> writes: > >> On 8/24/07, Jakub Narebski <jnareb@gmail.com> wrote: >>> There was idea to special case clone (just concatenate the packs, the >>> receiving side as someone told there can detect pack boundaries; do not >>> forget to pack loose objects, first), instead of using generic fetch --all >>> for clone, bnut no code. Code speaks louder than words (although if someone >>> would provide details of pack boundary detection...) >> >> A related concept, initial clone of a repository does the equivalent >> of repack -a on the repo before transmitting it. Why aren't we saving >> those results by switching the repo onto the new pack file? Then the >> next clone that comes along won't have to do anything but send the >> file. > > If the majority of the access to your repository is the initial > clone request, then it might be a worthwhile thing to do. In fact > didn't we use to have such a "pre-prepared pack" support? > > But I do not think "majority is initial clone" is the norm. Well, as long as the majority is not affected negatively, catering for a minority better is a strict improvement. Most repositories will never get cloned and won't be affected. But there are some repositories with a non-trivial amount of cloning. > Even among the people who does an "initial clone" (from the > end-user perspective), what they do may not be the initial full > clone your special hack helps (and that was one of the reasons > we dropped the pre-prepared pack support --- "been there, done > that" to some extent). If it doesn't get used, its presence does no harm, of course except from having to be maintained and tested. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-daemon on NSLU2 2007-08-25 0:04 ` Junio C Hamano 2007-08-25 7:12 ` David Kastrup @ 2007-08-25 17:02 ` Salikh Zakirov 1 sibling, 0 replies; 30+ messages in thread From: Salikh Zakirov @ 2007-08-25 17:02 UTC (permalink / raw) To: git Junio C Hamano wrote: > But I do not think "majority is initial clone" is the norm. > Even among the people who does an "initial clone" (from the > end-user perspective), what they do may not be the initial full > clone your special hack helps (and that was one of the reasons > we dropped the pre-prepared pack support --- "been there, done > that" to some extent). FWIW, on my previous job release engineering team used git in a special way involving lots of initial clones. The project itself was kept under SVN, and several machines were doing continuous builds, starting from scratch. Unfortunately, doing from scratch checkouts from SVN was not an option because of high SVN checkout overhead, and machines did a git-clone of imported repository instead. Obviously using --reference would have saved even more on initial clone, but the release team consisting of a pregnant woman and an intern student had neither time nor inclination to learn git any deeper than were strictly necessary to get the job done. Apparently, pure git-clone performance was good enough. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-daemon on NSLU2 2007-08-24 22:06 ` Jon Smirl 2007-08-24 22:39 ` Jakub Narebski @ 2007-08-25 0:10 ` Nicolas Pitre 1 sibling, 0 replies; 30+ messages in thread From: Nicolas Pitre @ 2007-08-25 0:10 UTC (permalink / raw) To: Jon Smirl; +Cc: Shawn O. Pearce, Git Mailing List On Fri, 24 Aug 2007, Jon Smirl wrote: > On 8/24/07, Nicolas Pitre <nico@cam.org> wrote: > > Try running "git repack -a" directly on the NSLU2. You should have the > > same performance problems as with a clone. > > This is true, it would take over 24hrs to finish. > > Is their a reason why initial clone hasn't been special cased? Why > can't initial clone just blast over the pack file already sitting on > the disk? What is the gain? You'll get back to the same performance problem eventually with some fetch operation, unless you intend to serve clients with the whole pack everytime just like the http protocol does. Also you don't want people cloning from you getting stuff that sits in your reflog. The native protocol makes sure that only the needed objects are sent over and no more. > I also wonder if a little application of some sorting to in-memory > data structures could help with the random IO patterns. I'm getting > the same data out of a stupid HTTP server and it doesn't go all IO > bound on me so a solution has to be possible. The http application is, indeed, stupid. It performs no reachability analysis, no repacking, no nothing except copying the bits over. And yes I did add some sorting optimizations in this round, so if you try 2.5.3-* you should have them. But there is a limit to what can be done. Point is, if you want serious Git serving, and not only _dumb_ protocols (http is one of them) then you need more RAM. The NSLU2 is cool, but maybe not appropriate for serving the Linux kernel natively with Git. Nicolas ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-daemon on NSLU2 2007-08-24 21:17 ` Jon Smirl 2007-08-24 21:54 ` Nicolas Pitre 2007-08-24 22:06 ` Jon Smirl @ 2007-08-24 23:28 ` Linus Torvalds 2007-08-25 15:44 ` Jon Smirl 2 siblings, 1 reply; 30+ messages in thread From: Linus Torvalds @ 2007-08-24 23:28 UTC (permalink / raw) To: Jon Smirl; +Cc: Nicolas Pitre, Shawn O. Pearce, Git Mailing List On Fri, 24 Aug 2007, Jon Smirl wrote: > > We're going something wrong in git-daemon. Nope. Or rather, it's mostly by design. > I can clone the tree in five minutes using the http protocol. Using the > git protocol would take 24hrs if I let it finish. The http side doesn't actually do any global verification, the way git-daemon does. So to it, everything is just temporary buffers, and you don't need any memory at all, really. git-daemon will create a packfile. That means that it has to generate the *global* object reachability, and will then optimize the object packing etc etc. That's a minimum of something like 48 bytes per object for just the object chains, and the kernel has a *lot* of objects (over half a million). In addition to the object chains yourself, the native protocol will also obviously have to actually *look* at and parse all the tree and commit objects while it does all this, so while it doesn't necessarily keep all of those in memory all the time, it will need to access them, and if you don't have enough memory to cache them, that will add its own set of IO. So I haven't checked exactly how much memory you really want to have to serve big projects, but with some handwavy guesstimate, if you actually want to do a good job I'd guess that you really want to have at least as much memory as the size of largest project you are serving, and probably add at least 10-20% on top of that. So for the kernel, at a guess, you'd probably want to have at least 256MB of RAM to do a half-way good job. 512MB is likely nicer and allows you to actually cache the stuff over multiple accesses. But I haven't actually tested. Maybe it might be bearable at 128M. Linus ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-daemon on NSLU2 2007-08-24 23:28 ` Linus Torvalds @ 2007-08-25 15:44 ` Jon Smirl 2007-08-26 9:33 ` Jeff King 0 siblings, 1 reply; 30+ messages in thread From: Jon Smirl @ 2007-08-25 15:44 UTC (permalink / raw) To: Linus Torvalds, jnareb; +Cc: Nicolas Pitre, Shawn O. Pearce, Git Mailing List On 8/24/07, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > I can clone the tree in five minutes using the http protocol. Using the > > git protocol would take 24hrs if I let it finish. > > The http side doesn't actually do any global verification, the way > git-daemon does. So to it, everything is just temporary buffers, and you > don't need any memory at all, really. > > git-daemon will create a packfile. That means that it has to generate the > *global* object reachability, and will then optimize the object packing > etc etc. That's a minimum of something like 48 bytes per object for just > the object chains, and the kernel has a *lot* of objects (over half a > million). A large, repeating work load is created in this process when you take a 200MB pack, repack it to add a few loose objects and then don't save the results. This model makes the NSLU2 unusable, but I also see it at my shared hosting provider. Initial clones of a repo that take 3min from kernel.org take 25min on a shared host since the RAM is not dedicated. There are three categories of fetches: 1) initial clone, fetch all 2) fetch recent 3) I haven't fetched in three months 99% of fetches fall in the first two categories. A very simple solution is to sendfile() existing packs if they contain any objects that the client wants and let the client deal with the unwanted objects. Yes this does send extra traffic over the net, but the only group significantly impacted is #2 which is the most infrequent group. Loose objects are handled as they are currently. To optimize this scheme you need to let the loose objects build up at the server and then periodically sweep only the older ones into a pack. Packing the entire repo into a single pack would cause recent fetches to retrieve the entire pack. Initial clone can be optimized further by recognizing that the receiving repository is empty and sending them everything; no need to compute which objects are missing at the server. This method will speed up initial clone since the existing pack can be immediately sent instead of waiting on a pack file to be built. Build the loose object pack in parallel with sending the existing packs. I recognize that in the case of cloning a single branch or --reference too many objects will also be transmitted but I believe the benefits of reducing the server load outweigh the overhead of transmitting extra objects in this case. You can always remove the extra objects on the client side. On 8/24/07, Jakub Narebski <jnareb@gmail.com> wrote: > There was idea to special case clone (just concatenate the packs, the > receiving side as someone told there can detect pack boundaries; do not > forget to pack loose objects, first), instead of using generic fetch --all > for clone, bnut no code. Code speaks louder than words (although if someone > would provide details of pack boundary detection...) Write the file name and length into the socket before sending the pack. Use sendfile() or it's current incarnation to actually send the pack. Insert these header lines between packs. > In addition to the object chains yourself, the native protocol will also > obviously have to actually *look* at and parse all the tree and commit > objects while it does all this, so while it doesn't necessarily keep all > of those in memory all the time, it will need to access them, and if you > don't have enough memory to cache them, that will add its own set of IO. > > So I haven't checked exactly how much memory you really want to have to > serve big projects, but with some handwavy guesstimate, if you actually > want to do a good job I'd guess that you really want to have at least as > much memory as the size of largest project you are serving, and probably > add at least 10-20% on top of that. > > So for the kernel, at a guess, you'd probably want to have at least 256MB > of RAM to do a half-way good job. 512MB is likely nicer and allows you to > actually cache the stuff over multiple accesses. > > But I haven't actually tested. Maybe it might be bearable at 128M. > > Linus > -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-daemon on NSLU2 2007-08-25 15:44 ` Jon Smirl @ 2007-08-26 9:33 ` Jeff King 2007-08-26 16:34 ` Jon Smirl 2007-08-27 0:14 ` Jakub Narebski 0 siblings, 2 replies; 30+ messages in thread From: Jeff King @ 2007-08-26 9:33 UTC (permalink / raw) To: Jon Smirl Cc: Linus Torvalds, jnareb, Nicolas Pitre, Shawn O. Pearce, Git Mailing List On Sat, Aug 25, 2007 at 11:44:07AM -0400, Jon Smirl wrote: > A very simple solution is to sendfile() existing packs if they contain > any objects that the client wants and let the client deal with the > unwanted objects. Yes this does send extra traffic over the net, but > the only group significantly impacted is #2 which is the most > infrequent group. > > Loose objects are handled as they are currently. To optimize this > scheme you need to let the loose objects build up at the server and > then periodically sweep only the older ones into a pack. Packing the > entire repo into a single pack would cause recent fetches to retrieve > the entire pack. I was about to write "but then 'fetch recent' clients will have to get the entire repo after the upstream does a 'git-repack -a -d'" but you seem to have figured that out already. I'm unclear: are you proposing new behavior for git-daemon in general, or a special mode for resource-constrained servers? If general behavior, are you suggesting that we never use 'git-repack -a' on repos which might be cloned? -Peff ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-daemon on NSLU2 2007-08-26 9:33 ` Jeff King @ 2007-08-26 16:34 ` Jon Smirl 2007-08-26 17:15 ` Linus Torvalds 2007-08-27 0:14 ` Jakub Narebski 1 sibling, 1 reply; 30+ messages in thread From: Jon Smirl @ 2007-08-26 16:34 UTC (permalink / raw) To: Jeff King Cc: Linus Torvalds, jnareb, Nicolas Pitre, Shawn O. Pearce, Git Mailing List On 8/26/07, Jeff King <peff@peff.net> wrote: > On Sat, Aug 25, 2007 at 11:44:07AM -0400, Jon Smirl wrote: > > > A very simple solution is to sendfile() existing packs if they contain > > any objects that the client wants and let the client deal with the > > unwanted objects. Yes this does send extra traffic over the net, but > > the only group significantly impacted is #2 which is the most > > infrequent group. > > > > Loose objects are handled as they are currently. To optimize this > > scheme you need to let the loose objects build up at the server and > > then periodically sweep only the older ones into a pack. Packing the > > entire repo into a single pack would cause recent fetches to retrieve > > the entire pack. > > I was about to write "but then 'fetch recent' clients will have to get > the entire repo after the upstream does a 'git-repack -a -d'" but you > seem to have figured that out already. > > I'm unclear: are you proposing new behavior for git-daemon in general, > or a special mode for resource-constrained servers? If general behavior, > are you suggesting that we never use 'git-repack -a' on repos which > might be cloned? This would be a new general behavior. There are cases where git-daemon is very resource hungry, rearranging things a little can remove this need for everyone. There are several ways to address the repack -a problem. But the simplest solution may be the best, send existing packs only on an initial clone. In all other cases continue with the current algorithm. We could work on methods for making the middle case better but it is so infrequent it is probably not worth bothering with. Changing git-daemon only for the initial clone case also means that people don't need to change the way they manage packs. Posters have been saying, why worry about initial clone since it isn't done that often. I agree that it isn't done that often, but if it is done all on my NSLU2 it will take about 40hrs to complete. We can easily see the impact of changing the the initial clone algorithm, the http clone takes 3min. BTW, if the NSLU2 needs a repack -a I can do it on another machine and copy it over. Or maybe someone will write a repack that is happy in 20MB. The NSLU2 is a great home server, it is usually fast enough. Power consumption is a tiny 8W, fine to leave on 24/7, My NSLU2 is as powerful as the average desktop machine in the early 90's, how quickly we forget. > > -Peff > -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-daemon on NSLU2 2007-08-26 16:34 ` Jon Smirl @ 2007-08-26 17:15 ` Linus Torvalds 2007-08-26 18:06 ` Jon Smirl 2007-08-26 22:24 ` Daniel Hulme 0 siblings, 2 replies; 30+ messages in thread From: Linus Torvalds @ 2007-08-26 17:15 UTC (permalink / raw) To: Jon Smirl Cc: Jeff King, jnareb, Nicolas Pitre, Shawn O. Pearce, Git Mailing List On Sun, 26 Aug 2007, Jon Smirl wrote: > > Changing git-daemon only for the initial clone case also means that > people don't need to change the way they manage packs. I do agree that we might want to do some special-case handling for the initial clone (because it *is* kind of special), but it's not necessarily as easy as just re-using an existing pack. At a minimum, we'd need to have something that knows how to make a single pack out of several packs and some loose objects. That shouldn't be *hard*, but it's certainly nontrivial, especially in the presense of the same objects possibly being available more than once in different packs. [ The "duplicate object" thing does actually happen: even if you use only "git native" protocols, you can get duplicate objects because a file was changed back to an earlier version. The incremental packs you get from push/pull'ing between two repositories try to send the minimal incremental changes, but the keyword here is _try_: they will potentially send objects that the receiver already has, if it's not obvious that the receiver has them from the "commit boundary" cases ] Maybe the client side will handle a pack with duplicate objects perfectly fine, and it's not an issue. Maybe. It might even be likely (I can't think of anything that would obviously break). But at a minimum, it would be something that needs some code on the sending side, and a lot of verification that the end result works ok on the receiving side. And there's actually a deeper problem: the current native protocol guarantees that the objects sent over are only those that are reachable. That matters. It matters for subtle security issues (maybe you are exporting some repository that was rebased, and has objects that you didn't *intend* to make public!), but it also matters for issues like git "alternates" files. If you only ever look at a single repo, you'll never see the alternates issue, but if you're seriously looking at serving git repositories, I don't really see the "single repo" case as being at all the most common or interesting case. And if you look at something like kernel.org, the "alternates" thing is *much* more important than how much memory git-daemon uses! Yes, kernel.org would probably be much happier if git-daemon wasn't such a memory pig occasionally, but on the other hand, the win from using alternates and being able to share 99% of all objects in all the various related kernel repositories is actually likely to be a *bigger* memory win than any git-daemon memory usage, because now the disk caching works a hell of a lot better! So it's not actually clear how the initial clone thing can be optimized on the server side. It's easier to optimize on the *client* side: just do the initial clone with rsync/http (and "git gc" it on the client afterwards), and then change it to the git native protocol after the clone. That may not sound very user-friendly, but let's face it, I think there is exactly one person in the whole universe that tries to use an NSLU2 as a git server. So the "client-side workaround" is likely to affect a very limited number of clients ;) Linus ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-daemon on NSLU2 2007-08-26 17:15 ` Linus Torvalds @ 2007-08-26 18:06 ` Jon Smirl 2007-08-26 18:26 ` Linus Torvalds 2007-08-26 22:24 ` Daniel Hulme 1 sibling, 1 reply; 30+ messages in thread From: Jon Smirl @ 2007-08-26 18:06 UTC (permalink / raw) To: Linus Torvalds Cc: Jeff King, jnareb, Nicolas Pitre, Shawn O. Pearce, Git Mailing List On 8/26/07, Linus Torvalds <torvalds@linux-foundation.org> wrote: > And there's actually a deeper problem: the current native protocol > guarantees that the objects sent over are only those that are reachable. > That matters. It matters for subtle security issues (maybe you are > exporting some repository that was rebased, and has objects that you > didn't *intend* to make public!), but it also matters for issues like git > "alternates" files. Are these objects visible through the other protocols? It seems dangerous to leave something on an open server that you want to keep hidden. > If you only ever look at a single repo, you'll never see the alternates > issue, but if you're seriously looking at serving git repositories, I > don't really see the "single repo" case as being at all the most common or > interesting case. > > And if you look at something like kernel.org, the "alternates" thing is > *much* more important than how much memory git-daemon uses! Yes, > kernel.org would probably be much happier if git-daemon wasn't such a > memory pig occasionally, but on the other hand, the win from using > alternates and being able to share 99% of all objects in all the various > related kernel repositories is actually likely to be a *bigger* memory win > than any git-daemon memory usage, because now the disk caching works a > hell of a lot better! Doesn't kernel.org use alternates or something equivalent for serving up all those nearly identical kernel trees? I've been handling the problem locally by using remotes and fetching all the repos I'm interested in into a single git db. > > So it's not actually clear how the initial clone thing can be optimized on > the server side. > > It's easier to optimize on the *client* side: just do the initial clone > with rsync/http (and "git gc" it on the client afterwards), and then > change it to the git native protocol after the clone. Even better, get them to clone from kernel.org and then just fetch in the differences from my server. It's an educational problem. How about changing initial clone to refuse to use the git protocol? > > That may not sound very user-friendly, but let's face it, I think there is > exactly one person in the whole universe that tries to use an NSLU2 as a > git server. So the "client-side workaround" is likely to affect a very > limited number of clients ;) I'll send you one and double the size of the user base. I have this fancy new 20Mb FIOS connection and I can't come up with anything to use the bandwidth on. Anyway, I already gave up and moved on to a hosting provider. Repo is here: http://git.digispeaker.com/ There's nothing there yet but a clone of the 2.6 tree. I don't think there is a solution for running a git daemon on a shared host. Petr pointed out to me that an NSLU2 is late 90's equivalent not early so my memory if faulty too. > > Linus > -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-daemon on NSLU2 2007-08-26 18:06 ` Jon Smirl @ 2007-08-26 18:26 ` Linus Torvalds 2007-08-26 19:00 ` Jon Smirl 2007-08-27 11:03 ` Theodore Tso 0 siblings, 2 replies; 30+ messages in thread From: Linus Torvalds @ 2007-08-26 18:26 UTC (permalink / raw) To: Jon Smirl Cc: Jeff King, jnareb, Nicolas Pitre, Shawn O. Pearce, Git Mailing List On Sun, 26 Aug 2007, Jon Smirl wrote: > > On 8/26/07, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > And there's actually a deeper problem: the current native protocol > > guarantees that the objects sent over are only those that are reachable. > > That matters. It matters for subtle security issues (maybe you are > > exporting some repository that was rebased, and has objects that you > > didn't *intend* to make public!), but it also matters for issues like git > > "alternates" files. > > Are these objects visible through the other protocols? It seems > dangerous to leave something on an open server that you want to keep > hidden. They'd be visible to any stupid walker, yes. But if you're security-conscious, you'd simply not *allow* any stupid walkers. One of the goals of "git-daemon" was to have a simple service that was "obviously secure". Now, it's debatable just how obvious the daemon is, but it really is pretty simple, and I do think it should be possible to almost statically validate that it only ever reads files, and that it will only ever read files that act like valid *git* data. Some people may care about that kind of thing. I don't know how many, but it really was one of the design criteria (which is why, for example, git daemon will just silently close the connection if it finds something fishy: no fishing expeditions with bad clients trying to figure out what files exist on a server allowed!). So the fact that a web server or rsync will expose everything is kind of irrelevant - those are *designed* to expose everything. git-daemon was designed *not* to do that. > Doesn't kernel.org use alternates or something equivalent for serving > up all those nearly identical kernel trees? Absolutely. And that's the point. "git-daemon" will serve a nice individualized pack, even though any particular repository doesn't have one, but is really a combination of "the base Linus pack + extensions". > > So it's not actually clear how the initial clone thing can be optimized on > > the server side. > > > > It's easier to optimize on the *client* side: just do the initial clone > > with rsync/http (and "git gc" it on the client afterwards), and then > > change it to the git native protocol after the clone. > > Even better, get them to clone from kernel.org and then just fetch in > the differences from my server. It's an educational problem. Yes. > How about changing initial clone to refuse to use the git protocol? Absolutely not. It's quite often the best one to use (the ssh protocol has the exact same issues, and is the only secure protocol). But on a SNLU2, maybe *you* want to make your server side refuse it? I would be easy enough: if the client doesn't report any existing SHA1's, you just say "I'm not going to work with you". Linus ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-daemon on NSLU2 2007-08-26 18:26 ` Linus Torvalds @ 2007-08-26 19:00 ` Jon Smirl 2007-08-26 20:19 ` Linus Torvalds 2007-08-27 11:03 ` Theodore Tso 1 sibling, 1 reply; 30+ messages in thread From: Jon Smirl @ 2007-08-26 19:00 UTC (permalink / raw) To: Linus Torvalds Cc: Jeff King, jnareb, Nicolas Pitre, Shawn O. Pearce, Git Mailing List On 8/26/07, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Doesn't kernel.org use alternates or something equivalent for serving > > up all those nearly identical kernel trees? > > Absolutely. And that's the point. "git-daemon" will serve a nice > individualized pack, even though any particular repository doesn't have > one, but is really a combination of "the base Linus pack + extensions". A really simple change to the git protocol would be to make the client loop on the request. On the first request the server would see that the client has no objects and send the "base Linus pack". The client would then loop around and repeat the process which will trigger the current pack building process. Do pack files contain enough information about the heads of the object chains for this to work? The client needs to be able to determine it's state after receiving the pack and send the info back in the next round. I'm not buying the security argument. If you want something kept hidden get it out of the public db. If I know the sha of the hidden object can't I just add a head for it and git-deamon will happily send it and the chain up to it to me? -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-daemon on NSLU2 2007-08-26 19:00 ` Jon Smirl @ 2007-08-26 20:19 ` Linus Torvalds 2007-08-26 21:22 ` Junio C Hamano 0 siblings, 1 reply; 30+ messages in thread From: Linus Torvalds @ 2007-08-26 20:19 UTC (permalink / raw) To: Jon Smirl Cc: Jeff King, jnareb, Nicolas Pitre, Shawn O. Pearce, Git Mailing List On Sun, 26 Aug 2007, Jon Smirl wrote: > > A really simple change to the git protocol would be to make the client > loop on the request. On the first request the server would see that > the client has no objects and send the "base Linus pack". The client > would then loop around and repeat the process which will trigger the > current pack building process. Jon, just give it up. The fact is, the git protocol works the right way already. > I'm not buying the security argument. If you want something kept hidden > get it out of the public db. If I know the sha of the hidden object > can't I just add a head for it and git-deamon will happily send it and > the chain up to it to me? That's a particularly idiotic statement. If you know the SHA1, there can *by*definition* not be any hidden objects. The SHA1 depends on the object chain. Linus ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-daemon on NSLU2 2007-08-26 20:19 ` Linus Torvalds @ 2007-08-26 21:22 ` Junio C Hamano 0 siblings, 0 replies; 30+ messages in thread From: Junio C Hamano @ 2007-08-26 21:22 UTC (permalink / raw) To: Linus Torvalds Cc: Jon Smirl, Jeff King, jnareb, Nicolas Pitre, Shawn O. Pearce, Git Mailing List Linus Torvalds <torvalds@linux-foundation.org> writes: >> I'm not buying the security argument. If you want something kept hidden >> get it out of the public db. If I know the sha of the hidden object >> can't I just add a head for it and git-deamon will happily send it and >> the chain up to it to me? > > That's a particularly idiotic statement. > > If you know the SHA1, there can *by*definition* not be any hidden objects. > The SHA1 depends on the object chain. I think what we have is even stronger --- upload-pack does not allow asking for an arbitrary commit. The requesting fetch-pack side needs to pick from what are offerred, and upload-pack makes sure of that. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-daemon on NSLU2 2007-08-26 18:26 ` Linus Torvalds 2007-08-26 19:00 ` Jon Smirl @ 2007-08-27 11:03 ` Theodore Tso 2007-08-27 16:26 ` Linus Torvalds 1 sibling, 1 reply; 30+ messages in thread From: Theodore Tso @ 2007-08-27 11:03 UTC (permalink / raw) To: Linus Torvalds Cc: Jon Smirl, Jeff King, jnareb, Nicolas Pitre, Shawn O. Pearce, Git Mailing List On Sun, Aug 26, 2007 at 11:26:07AM -0700, Linus Torvalds wrote: > > How about changing initial clone to refuse to use the git protocol? > > Absolutely not. It's quite often the best one to use (the ssh protocol > has the exact same issues, and is the only secure protocol). > > But on a SNLU2, maybe *you* want to make your server side refuse it? I > would be easy enough: if the client doesn't report any existing SHA1's, > you just say "I'm not going to work with you". What if the server sends a message which current clients interprets as an error, and which newer clients could interpret as, "do a clone from <this> URL, and then come back and talk to me". Basically an automated redirect to get the "Linus base pack" somewhere else, and then to go back to the original server. It certainly doesn't make sense to change anything about the low-level protocol, but maybe a higher level redirect would make sense, just as a user convenience thing. - Ted ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-daemon on NSLU2 2007-08-27 11:03 ` Theodore Tso @ 2007-08-27 16:26 ` Linus Torvalds 0 siblings, 0 replies; 30+ messages in thread From: Linus Torvalds @ 2007-08-27 16:26 UTC (permalink / raw) To: Theodore Tso Cc: Jon Smirl, Jeff King, jnareb, Nicolas Pitre, Shawn O. Pearce, Git Mailing List On Mon, 27 Aug 2007, Theodore Tso wrote: > > What if the server sends a message which current clients interprets as > an error, and which newer clients could interpret as, "do a clone from > <this> URL, and then come back and talk to me". Basically an > automated redirect to get the "Linus base pack" somewhere else, and > then to go back to the original server. It certainly doesn't make > sense to change anything about the low-level protocol, but maybe a > higher level redirect would make sense, just as a user convenience thing. I agree, a redirect might be a good idea regardless of whether it's something like "I'm a poor little NSLU2, please don't do anything but incremental updates", or whether it's something like "this repository has moved, use address xyz instead". And it should be pretty easy from a high-level protocol, although it does obviously need both server and client support. Linus ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-daemon on NSLU2 2007-08-26 17:15 ` Linus Torvalds 2007-08-26 18:06 ` Jon Smirl @ 2007-08-26 22:24 ` Daniel Hulme 1 sibling, 0 replies; 30+ messages in thread From: Daniel Hulme @ 2007-08-26 22:24 UTC (permalink / raw) To: Linus Torvalds Cc: Jon Smirl, Jeff King, jnareb, Nicolas Pitre, Shawn O. Pearce, Git Mailing List [-- Attachment #1: Type: text/plain, Size: 1057 bytes --] On Sun, Aug 26, 2007 at 10:15:24AM -0700, Linus Torvalds wrote: > It's easier to optimize on the *client* side: just do the initial clone > with rsync/http (and "git gc" it on the client afterwards), and then > change it to the git native protocol after the clone. When I was working on Xen two years ago, they did the same thing with their Mercurial repository. They had a proper repo that handled all the push and fetch traffic, and a cron job would periodically pull from that into a second repo. This second one was served by http. People were encouraged to download the seed repo and then do a fetch (from the main one) immediately. I don't know whether they still do that, but in any case it shows your idea is not unprecedented. -- Kanga said to Roo, "Drink up your milk first, dear, and talk after- wards." So Roo, who was drinking his milk, tried to say that he could do both at once... and had to be patted on the back and dried for quite a long time afterwards. A. A. Milne, 'Winnie-the-Pooh' [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-daemon on NSLU2 2007-08-26 9:33 ` Jeff King 2007-08-26 16:34 ` Jon Smirl @ 2007-08-27 0:14 ` Jakub Narebski 1 sibling, 0 replies; 30+ messages in thread From: Jakub Narebski @ 2007-08-27 0:14 UTC (permalink / raw) To: Jeff King, Git Mailing List Cc: Jon Smirl, Linus Torvalds, Nicolas Pitre, Shawn O. Pearce On Sun, Aug 26, 2007, Jeff King wrote: > On Sat, Aug 25, 2007 at 11:44:07AM -0400, Jon Smirl wrote: > >> A very simple solution is to sendfile() existing packs if they contain >> any objects that the client wants and let the client deal with the >> unwanted objects. Yes this does send extra traffic over the net, but >> the only group significantly impacted is #2 which is the most >> infrequent group. >> >> Loose objects are handled as they are currently. To optimize this >> scheme you need to let the loose objects build up at the server and >> then periodically sweep only the older ones into a pack. Packing the >> entire repo into a single pack would cause recent fetches to retrieve >> the entire pack. > > I was about to write "but then 'fetch recent' clients will have to get > the entire repo after the upstream does a 'git-repack -a -d'" but you > seem to have figured that out already. > > I'm unclear: are you proposing new behavior for git-daemon in general, > or a special mode for resource-constrained servers? If general behavior, > are you suggesting that we never use 'git-repack -a' on repos which > might be cloned? I think that "reuse existing packs if sensible" idea (instead of generating always new pack) is a good one, even if at first limited to the clone case. There are nevertheless a few complications. 1. When discussing this idea on git mailing list some time ago somebody said that we don't need to implement "multi pack" extension (which was at the beginning in the design, to add later, if I understand correctly), it is enough to concatenate packs. The receiving side can then detect boundaries between packs and split them appropriately. But is a concatenated a proper pack? If not, then we can send concatenation of packs only if the client (receiving side) understands it, and can split it; it means checking for protocol extension... 2. How to detect that request is for a clone? git-clone is get all remote heads and fetch from just received heads. But because fecthing refs and fetching objects is separate, we cannot I think use this sequence for detecting that we want a clone. We can use "no haves" as heuristic to detect a clone request, but "no haves" occurs also for initial fetching of single branch (i.e. using: git-remote; git-fetch sequence instead of git-clone). 3. The problem with alternates mentioned by Linus is not much a problem, as we can simply consider packs from the alternate repository/repositories. For example if we use single alternate, we would send concatenation of packs from this repository, and from alternate (and pack of loose objects from this repository). We would probably want to have some heuristic (besides configuring git-daemon) to choose between reusing existing packs (and sending them concatenated), and generating a pack for sending. Note that for dumb transports we have the opposite problem and opposite idea: we always send full packs for dumb transports; the idea was to use range downloading (available at least for http and ftp protocols) to download only needed fragments of packs. Perhaps if some % of pack (number of objects in the pack or size of pack) is to be send then we reuse the pack, and remove objects in the pack from consideration. No idea of how to implement that, though. Or if number of objects in pack to be send crosses some threshold, or generating pack/doing reachability analysis takes to loong, then reuse existing packs. Or you can wait fro the GitTorrent protocol to be implemented, or implement it yourself... ;-) -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-daemon on NSLU2 2007-08-24 19:38 ` Jon Smirl 2007-08-24 20:23 ` Nicolas Pitre @ 2007-08-24 20:27 ` Jon Smirl 1 sibling, 0 replies; 30+ messages in thread From: Jon Smirl @ 2007-08-24 20:27 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Git Mailing List Not sure what I did but I have git-daemon working on the NSLU2 now. It is unusable with 32MB physical memory. I am 2hrs into the clone of the kernel repository and it has only counted 9,500 objects and used 100min CPU time. There are 540,000 objects in the repository. Disk is chattering insanely, I'm way IO bound. procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 6 2 37960 972 168 11952 160 64 1748 64 2224 2233 5 28 0 67 4 2 37960 1012 176 11756 168 0 2424 0 2517 2780 10 29 0 61 2 2 37960 944 200 11792 152 0 1456 88 2102 2067 6 21 0 73 2 2 37960 1120 180 11620 120 0 1180 0 2106 2122 4 21 0 75 2 2 37960 1044 180 11788 76 28 1800 28 2255 2275 7 27 0 66 4 3 37960 1144 176 11436 68 0 1896 12 2384 2553 7 23 0 70 4 1 37972 992 188 11932 44 188 1148 188 1910 1731 3 18 0 79 3 2 37976 804 196 12008 336 16 2104 112 2353 2490 13 22 0 65 2 2 37976 1068 164 11720 96 8 2008 8 2502 2731 5 36 0 59 2 2 37976 1280 184 11528 140 8 1332 36 2054 1956 7 26 0 67 4 2 37976 1028 200 11552 264 16 956 16 1855 1710 4 20 0 76 2 2 37976 844 192 11680 144 8 1576 8 2206 2307 5 31 0 64 3 1 37984 1304 172 11264 92 28 1444 52 1998 1887 5 23 0 72 2 2 38000 1012 168 11680 124 84 1896 192 2385 2486 3 30 0 67 5 2 38008 928 164 11916 136 20 1776 20 2256 2308 11 22 0 67 2 3 38008 1168 184 11704 144 20 1820 32 2163 2186 5 24 0 71 4 4 38016 816 156 11784 248 32 1828 44 2328 2422 2 24 0 74 4 1 38020 1476 160 11448 152 104 2080 116 1925 1728 3 24 0 73 2 5 38028 828 192 12140 240 140 1768 232 2319 2226 4 29 0 68 2 2 38020 1136 172 11880 156 16 1764 72 2081 2020 3 20 0 77 2 3 38060 1040 172 12016 188 140 2056 140 2180 2182 6 26 0 68 root 11241 0.3 0.0 104 24 ? Ss 06:54 0:07 runsv git-daemon gitlog 11242 0.0 0.1 124 40 ? S 06:54 0:01 svlogd -tt /var/log/git-daemon root 11335 0.0 0.4 1620 140 pts/0 S+ 06:56 0:00 strace git-daemon --verbose --export-all /home/git root 11336 0.0 0.4 1808 144 pts/0 S+ 06:56 0:00 git-daemon --verbose --export-all /home/git root 11344 0.1 1.0 60240 328 pts/0 S+ 06:56 0:02 /usr/local/bin/git-upload-pack --strict --timeout=0 . root 11349 6.5 50.8 171868 15240 pts/0 D+ 06:56 2:09 /usr/local/bin/git-upload-pack --strict --timeout=0 . root 11350 0.6 14.6 16392 4380 pts/0 S+ 06:56 0:12 /usr/local/bin git-pack-objects --stdout --progress --delta-base-offset -- Jon Smirl jonsmirl@gmail.com ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2007-08-27 16:28 UTC | newest] Thread overview: 30+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-08-24 5:54 git-daemon on NSLU2 Jon Smirl 2007-08-24 6:21 ` Shawn O. Pearce 2007-08-24 19:38 ` Jon Smirl 2007-08-24 20:23 ` Nicolas Pitre 2007-08-24 21:17 ` Jon Smirl 2007-08-24 21:54 ` Nicolas Pitre 2007-08-24 22:06 ` Jon Smirl 2007-08-24 22:39 ` Jakub Narebski 2007-08-24 22:59 ` Junio C Hamano 2007-08-24 23:21 ` Jakub Narebski 2007-08-24 23:46 ` Jon Smirl 2007-08-25 0:04 ` Junio C Hamano 2007-08-25 7:12 ` David Kastrup 2007-08-25 17:02 ` Salikh Zakirov 2007-08-25 0:10 ` Nicolas Pitre 2007-08-24 23:28 ` Linus Torvalds 2007-08-25 15:44 ` Jon Smirl 2007-08-26 9:33 ` Jeff King 2007-08-26 16:34 ` Jon Smirl 2007-08-26 17:15 ` Linus Torvalds 2007-08-26 18:06 ` Jon Smirl 2007-08-26 18:26 ` Linus Torvalds 2007-08-26 19:00 ` Jon Smirl 2007-08-26 20:19 ` Linus Torvalds 2007-08-26 21:22 ` Junio C Hamano 2007-08-27 11:03 ` Theodore Tso 2007-08-27 16:26 ` Linus Torvalds 2007-08-26 22:24 ` Daniel Hulme 2007-08-27 0:14 ` Jakub Narebski 2007-08-24 20:27 ` Jon Smirl
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).