* git clone: very long "resolving deltas" phase @ 2010-04-06 14:18 Vitaly Berov 2010-04-06 15:01 ` Matthieu Moy 2010-04-06 21:10 ` Nicolas Pitre 0 siblings, 2 replies; 39+ messages in thread From: Vitaly Berov @ 2010-04-06 14:18 UTC (permalink / raw) To: git We have quite a large repository and "git clone" takes about 6 hours. Herewith "resolving deltas" takes most of the time. What git does at this stage and how can we optimize it? ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-06 14:18 git clone: very long "resolving deltas" phase Vitaly Berov @ 2010-04-06 15:01 ` Matthieu Moy 2010-04-06 15:28 ` Vitaly Berov ` (2 more replies) 2010-04-06 21:10 ` Nicolas Pitre 1 sibling, 3 replies; 39+ messages in thread From: Matthieu Moy @ 2010-04-06 15:01 UTC (permalink / raw) To: Vitaly Berov; +Cc: git Vitaly Berov <vitaly.berov@gmail.com> writes: > We have quite a large repository and "git clone" takes about 6 hours. Herewith > "resolving deltas" takes most of the time. > What git does at this stage and how can we optimize it? Does running "git gc" (long, but done once and for all) on the server help? -- Matthieu Moy http://www-verimag.imag.fr/~moy/ ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-06 15:01 ` Matthieu Moy @ 2010-04-06 15:28 ` Vitaly Berov 2010-04-06 15:29 ` Vitaly 2010-04-06 21:01 ` git clone: very long "resolving deltas" phase Nicolas Pitre 2 siblings, 0 replies; 39+ messages in thread From: Vitaly Berov @ 2010-04-06 15:28 UTC (permalink / raw) To: git; +Cc: git On 04/06/2010 07:01 PM, Matthieu Moy wrote: > Vitaly Berov<vitaly.berov@gmail.com> writes: > >> We have quite a large repository and "git clone" takes about 6 hours. Herewith >> "resolving deltas" takes most of the time. >> What git does at this stage and how can we optimize it? > > Does running "git gc" (long, but done once and for all) on the server > help? > Didn't try this one, but I'll give it a try, thanks. And what does this stage do? ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-06 15:01 ` Matthieu Moy 2010-04-06 15:28 ` Vitaly Berov @ 2010-04-06 15:29 ` Vitaly 2010-04-06 15:32 ` Andreas Ericsson 2010-04-06 21:01 ` git clone: very long "resolving deltas" phase Nicolas Pitre 2 siblings, 1 reply; 39+ messages in thread From: Vitaly @ 2010-04-06 15:29 UTC (permalink / raw) To: Matthieu Moy; +Cc: git I didn't try this, but I'll give it a try, thanks. And what does this stage mean? On 04/06/2010 07:01 PM, Matthieu Moy wrote: > Vitaly Berov<vitaly.berov@gmail.com> writes: > > >> We have quite a large repository and "git clone" takes about 6 hours. Herewith >> "resolving deltas" takes most of the time. >> What git does at this stage and how can we optimize it? >> > Does running "git gc" (long, but done once and for all) on the server > help? > > ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-06 15:29 ` Vitaly @ 2010-04-06 15:32 ` Andreas Ericsson [not found] ` <q2mec874dac1004060850r5eaa41fak2ba9889d07794651@mail.gmail.com> 2010-04-06 21:05 ` Nicolas Pitre 0 siblings, 2 replies; 39+ messages in thread From: Andreas Ericsson @ 2010-04-06 15:32 UTC (permalink / raw) To: Vitaly; +Cc: Matthieu Moy, git On 04/06/2010 05:29 PM, Vitaly wrote: > I didn't try this, but I'll give it a try, thanks. > > And what does this stage mean? > It means the server is busy creating a packfile to send over the wire. If you pack the repository before cloning from it, deltas from the packfile will simply be copied into the new pack. This will provide a huge speedboost, so make sure to repack the repository on the server every once in a while. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. ^ permalink raw reply [flat|nested] 39+ messages in thread
[parent not found: <q2mec874dac1004060850r5eaa41fak2ba9889d07794651@mail.gmail.com>]
* Re: git clone: very long "resolving deltas" phase [not found] ` <q2mec874dac1004060850r5eaa41fak2ba9889d07794651@mail.gmail.com> @ 2010-04-06 15:56 ` Vitaly Berov 2010-04-06 21:09 ` Nicolas Pitre 0 siblings, 1 reply; 39+ messages in thread From: Vitaly Berov @ 2010-04-06 15:56 UTC (permalink / raw) To: Shawn Pearce; +Cc: Andreas Ericsson, git, Matthieu Moy Why does git compute checksums on the client side? Isn't it already calculated on the "server" side? 2010/4/6 Shawn Pearce <spearce@spearce.org>: > Nope, the resolving deltas phase is about computing the checksums of each > object on the client side of the connection. Repacking the server might > have little impact on this phase, other than maybe to reduce the size and > thus the disk io required to scan the entire pack. > > On Apr 6, 2010 9:32 AM, "Andreas Ericsson" <ae@op5.se> wrote: >> On 04/06/2010 05:29 PM, Vitaly wrote: >>> I didn't try this, but I'll give it a try, thanks. >>> >>> And what does this stage mean? >>> >> >> It means the server is busy creating a packfile to send >> over the wire. If you pack the repository before cloning >> from it, deltas from the packfile will simply be copied >> into the new pack. This will provide a huge speedboost, >> so make sure to repack the repository on the server every >> once in a while. >> >> -- >> Andreas Ericsson andreas.ericsson@op5.se >> OP5 AB www.op5.se >> Tel: +46 8-230225 Fax: +46 8-230231 >> >> Considering the successes of the wars on alcohol, poverty, drugs and >> terror, I think we should give some serious thought to declaring war >> on peace. >> -- >> To unsubscribe from this list: send the line "unsubscribe git" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-06 15:56 ` Vitaly Berov @ 2010-04-06 21:09 ` Nicolas Pitre 2010-04-07 5:54 ` Vitaly Berov 2010-04-07 5:55 ` Vitaly 0 siblings, 2 replies; 39+ messages in thread From: Nicolas Pitre @ 2010-04-06 21:09 UTC (permalink / raw) To: Vitaly Berov; +Cc: Shawn Pearce, Andreas Ericsson, git, Matthieu Moy On Tue, 6 Apr 2010, Vitaly Berov wrote: > Why does git compute checksums on the client side? Isn't it already > calculated on the "server" side? Yes. But Git clients can't trust the server like that. The only way to make sure the server didn't send you crap data, or worse maliciously altered data, is actually to not transfer any checksum data but to recompute and validate the received payload locally. This being said, you should never have to wait 6 hours for that phase to complete. It is typically a matter of minutes if not seconds. Nicolas ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-06 21:09 ` Nicolas Pitre @ 2010-04-07 5:54 ` Vitaly Berov 2010-04-07 8:00 ` Ilari Liusvaara 2010-04-07 5:55 ` Vitaly 1 sibling, 1 reply; 39+ messages in thread From: Vitaly Berov @ 2010-04-07 5:54 UTC (permalink / raw) To: git; +Cc: Shawn Pearce, Andreas Ericsson, git, Matthieu Moy I suspected the security reasons. Ok, we work in trusted environment. How can we turn this behavior off? Vitaly On 04/07/2010 01:09 AM, Nicolas Pitre wrote: > On Tue, 6 Apr 2010, Vitaly Berov wrote: > >> Why does git compute checksums on the client side? Isn't it already >> calculated on the "server" side? > > Yes. But Git clients can't trust the server like that. > > The only way to make sure the server didn't send you crap data, or worse > maliciously altered data, is actually to not transfer any checksum data > but to recompute and validate the received payload locally. > > This being said, you should never have to wait 6 hours for that phase to > complete. It is typically a matter of minutes if not seconds. > > > Nicolas ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-07 5:54 ` Vitaly Berov @ 2010-04-07 8:00 ` Ilari Liusvaara 2010-04-07 8:14 ` Vitaly 2010-04-07 14:08 ` Nicolas Pitre 0 siblings, 2 replies; 39+ messages in thread From: Ilari Liusvaara @ 2010-04-07 8:00 UTC (permalink / raw) To: Vitaly Berov; +Cc: git, Shawn Pearce, Andreas Ericsson, Matthieu Moy On Wed, Apr 07, 2010 at 09:54:29AM +0400, Vitaly Berov wrote: > I suspected the security reasons. > > Ok, we work in trusted environment. How can we turn this behavior off? It can't be turned off. Protocol requires client to recompute hashes as they are not explicitly available in transport stream (must be inferred instead). > >This being said, you should never have to wait 6 hours for that phase to > >complete. It is typically a matter of minutes if not seconds. The reasons why it might take 6 hours (offhand from memory): - Extremely large repo - Very large files in repo pushing client into swap. -Ilari ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-07 8:00 ` Ilari Liusvaara @ 2010-04-07 8:14 ` Vitaly 2010-04-07 9:00 ` Ilari Liusvaara ` (2 more replies) 2010-04-07 14:08 ` Nicolas Pitre 1 sibling, 3 replies; 39+ messages in thread From: Vitaly @ 2010-04-07 8:14 UTC (permalink / raw) To: Ilari Liusvaara; +Cc: git, Shawn Pearce, Andreas Ericsson, Matthieu Moy Too bad.. Yes, we really have a very large repo with binary files. So, as far as I understand, the fastest way is to use rsync or smth like this instead of "git clone". P.S. Btw, how can I ask for a feature of incorporating hashes into transport stream in trusted environments? On 04/07/2010 12:00 PM, Ilari Liusvaara wrote: > On Wed, Apr 07, 2010 at 09:54:29AM +0400, Vitaly Berov wrote: > >> I suspected the security reasons. >> >> Ok, we work in trusted environment. How can we turn this behavior off? >> > > It can't be turned off. Protocol requires client to recompute hashes > as they are not explicitly available in transport stream (must be inferred > instead). > > >>> This being said, you should never have to wait 6 hours for that phase to >>> complete. It is typically a matter of minutes if not seconds. >>> > The reasons why it might take 6 hours (offhand from memory): > > - Extremely large repo > - Very large files in repo pushing client into swap. > > -Ilari > > ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-07 8:14 ` Vitaly @ 2010-04-07 9:00 ` Ilari Liusvaara 2010-04-07 9:37 ` Jakub Narebski 2010-04-07 14:20 ` Nicolas Pitre 2 siblings, 0 replies; 39+ messages in thread From: Ilari Liusvaara @ 2010-04-07 9:00 UTC (permalink / raw) To: Vitaly; +Cc: git, Shawn Pearce, Andreas Ericsson, Matthieu Moy On Wed, Apr 07, 2010 at 12:14:36PM +0400, Vitaly wrote: > Too bad.. > Yes, we really have a very large repo with binary files. Large binary files are the worst. I think that disabling deltification ('-delta' as attribute[*]) on them might actually help somewhat... > P.S. Btw, how can I ask for a feature of incorporating hashes into > transport stream in trusted environments? On this mailing list. But as a tip: don't bother: It is by far too large change relative to any possible benefit. [*] I think 'info/attributes' on server influences wheither those objects are attempted to be deltified or not. -Ilari ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-07 8:14 ` Vitaly 2010-04-07 9:00 ` Ilari Liusvaara @ 2010-04-07 9:37 ` Jakub Narebski 2010-04-07 14:20 ` Nicolas Pitre 2 siblings, 0 replies; 39+ messages in thread From: Jakub Narebski @ 2010-04-07 9:37 UTC (permalink / raw) To: Vitaly; +Cc: Ilari Liusvaara, git, Shawn Pearce, Andreas Ericsson, Matthieu Moy Please do not toppost. Vitaly <vitaly.berov@gmail.com> writes: > Too bad.. > Yes, we really have a very large repo with binary files. > > So, as far as I understand, the fastest way is to use rsync or smth > like this instead of "git clone". > > P.S. Btw, how can I ask for a feature of incorporating hashes into > transport stream in trusted environments? If you have very large binary files, perhaps git-bigfiles fork would help you: http://caca.zoy.org/wiki/git-bigfiles -- Jakub Narebski Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-07 8:14 ` Vitaly 2010-04-07 9:00 ` Ilari Liusvaara 2010-04-07 9:37 ` Jakub Narebski @ 2010-04-07 14:20 ` Nicolas Pitre 2010-04-07 14:35 ` Vitaly 2 siblings, 1 reply; 39+ messages in thread From: Nicolas Pitre @ 2010-04-07 14:20 UTC (permalink / raw) To: Vitaly; +Cc: Ilari Liusvaara, git, Shawn Pearce, Andreas Ericsson, Matthieu Moy On Wed, 7 Apr 2010, Vitaly wrote: > Too bad.. > Yes, we really have a very large repo with binary files. > > So, as far as I understand, the fastest way is to use rsync or smth like this > instead of "git clone". You should still be able to use 'git clone' with a rsync:// style URL. > P.S. Btw, how can I ask for a feature of incorporating hashes into transport > stream in trusted environments? As I'm trying to make you understand repeatedly now, this shouldn't be needed. A real fix for the bad behavior would be in order before papering over it. If the large binary blobs are the source of the clone problem, then they will cause the same problems with other commands such as 'git diff' or even 'git checkout' later on. So that "feature" you're asking for is misguided. What you might try on your client machines is this: git config --global core.deltaBaseCacheLimit 256m before doing a clone. Nicolas ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-07 14:20 ` Nicolas Pitre @ 2010-04-07 14:35 ` Vitaly 2010-04-07 14:55 ` Nicolas Pitre 0 siblings, 1 reply; 39+ messages in thread From: Vitaly @ 2010-04-07 14:35 UTC (permalink / raw) To: Nicolas Pitre Cc: Ilari Liusvaara, git, Shawn Pearce, Andreas Ericsson, Matthieu Moy On 04/07/2010 06:20 PM, Nicolas Pitre wrote: >> P.S. Btw, how can I ask for a feature of incorporating hashes into transport >> stream in trusted environments? >> > As I'm trying to make you understand repeatedly now, this shouldn't be > needed. A real fix for the bad behavior would be in order before > papering over it. > Nicolas, my post have been written before I received your message about reproducing and "stracing" the problem. I caught your idea and now reproducing the problem. My estimate is tomorrow (repack takes quite a long time). Vitaly ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-07 14:35 ` Vitaly @ 2010-04-07 14:55 ` Nicolas Pitre 2010-04-09 6:46 ` Vitaly Berov 0 siblings, 1 reply; 39+ messages in thread From: Nicolas Pitre @ 2010-04-07 14:55 UTC (permalink / raw) To: Vitaly; +Cc: Ilari Liusvaara, git, Shawn Pearce, Andreas Ericsson, Matthieu Moy On Wed, 7 Apr 2010, Vitaly wrote: > Nicolas, my post have been written before I received your message about > reproducing and "stracing" the problem. I caught your idea and now > reproducing the problem. No problem. > My estimate is tomorrow (repack takes quite a long time). The repack isn't so important. If it takes that long you might simply interrupt it and strace the client when "resolving deltas" is looking to be insanely long. In reality it is best if you don't repack as the client needs to cope with whatever the server throws at it and repacking your repo might hide the client issue. Then playing with core.deltaBaseCacheLimit instead would be quite interesting. Nicolas ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-07 14:55 ` Nicolas Pitre @ 2010-04-09 6:46 ` Vitaly Berov 2010-04-09 19:30 ` Nicolas Pitre 0 siblings, 1 reply; 39+ messages in thread From: Vitaly Berov @ 2010-04-09 6:46 UTC (permalink / raw) To: git; +Cc: Ilari Liusvaara, git, Shawn Pearce, Andreas Ericsson, Matthieu Moy Hi, On 04/07/2010 06:55 PM, Nicolas Pitre wrote: > > Then playing with core.deltaBaseCacheLimit instead would be quite > interesting. It's difficult to play with parameters because only receiving objects phase takes 1.5-2 hours. But I'll try "git config --global core.deltaBaseCacheLimit 256m" as you recommended. Vitaly ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-09 6:46 ` Vitaly Berov @ 2010-04-09 19:30 ` Nicolas Pitre 2010-04-10 6:32 ` Vitaly Berov 0 siblings, 1 reply; 39+ messages in thread From: Nicolas Pitre @ 2010-04-09 19:30 UTC (permalink / raw) To: Vitaly Berov Cc: git, Ilari Liusvaara, Shawn Pearce, Andreas Ericsson, Matthieu Moy On Fri, 9 Apr 2010, Vitaly Berov wrote: > Hi, > > On 04/07/2010 06:55 PM, Nicolas Pitre wrote: > > > > Then playing with core.deltaBaseCacheLimit instead would be quite > > interesting. > It's difficult to play with parameters because only receiving objects phase > takes 1.5-2 hours. Huh... I guess that's over 100Mbps ethernet? 57GB / 1.5h -> approx 10MB/s Nicolas ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-09 19:30 ` Nicolas Pitre @ 2010-04-10 6:32 ` Vitaly Berov 0 siblings, 0 replies; 39+ messages in thread From: Vitaly Berov @ 2010-04-10 6:32 UTC (permalink / raw) To: git; +Cc: git, Ilari Liusvaara, Shawn Pearce, Andreas Ericsson, Matthieu Moy On 04/09/2010 11:30 PM, Nicolas Pitre wrote: > On Fri, 9 Apr 2010, Vitaly Berov wrote: > >> Hi, >> >> On 04/07/2010 06:55 PM, Nicolas Pitre wrote: >>> >>> Then playing with core.deltaBaseCacheLimit instead would be quite >>> interesting. >> It's difficult to play with parameters because only receiving objects phase >> takes 1.5-2 hours. > > Huh... I guess that's over 100Mbps ethernet? > > 57GB / 1.5h -> approx 10MB/s Yes Vitaly ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-07 8:00 ` Ilari Liusvaara 2010-04-07 8:14 ` Vitaly @ 2010-04-07 14:08 ` Nicolas Pitre 2010-04-07 14:29 ` Sverre Rabbelier 1 sibling, 1 reply; 39+ messages in thread From: Nicolas Pitre @ 2010-04-07 14:08 UTC (permalink / raw) To: Ilari Liusvaara Cc: Vitaly Berov, git, Shawn Pearce, Andreas Ericsson, Matthieu Moy On Wed, 7 Apr 2010, Ilari Liusvaara wrote: > On Wed, Apr 07, 2010 at 09:54:29AM +0400, Vitaly Berov wrote: > > I suspected the security reasons. > > > > Ok, we work in trusted environment. How can we turn this behavior off? > > It can't be turned off. Protocol requires client to recompute hashes > as they are not explicitly available in transport stream (must be inferred > instead). > > > >This being said, you should never have to wait 6 hours for that phase to > > >complete. It is typically a matter of minutes if not seconds. > > The reasons why it might take 6 hours (offhand from memory): > > - Extremely large repo Six hours is still way out of the expected computational requirement. That's an expected time for an aggressive repack for example, where _each_ delta is attempted against a different base up to 250 times. But when indexing a fetched pack, each delta is expected to be computed only once. > - Very large files in repo pushing client into swap. This shouldn't happen since commit 92392b4a which provide a cap on memory usage during the delta resolution process. So without a look at the actual repository causing this pathological behavior it is hard to guess what the issue and the required fix might be. Nicolas ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-07 14:08 ` Nicolas Pitre @ 2010-04-07 14:29 ` Sverre Rabbelier 2010-04-07 14:37 ` Vitaly 0 siblings, 1 reply; 39+ messages in thread From: Sverre Rabbelier @ 2010-04-07 14:29 UTC (permalink / raw) To: Nicolas Pitre, Vitaly Berov Cc: Ilari Liusvaara, git, Shawn Pearce, Andreas Ericsson, Matthieu Moy Heya, On Wed, Apr 7, 2010 at 09:08, Nicolas Pitre <nico@fluxnic.net> wrote: > This shouldn't happen since commit 92392b4a which provide a cap on > memory usage during the delta resolution process. Which made me think of asking: Vitaly, what version of git are you running? Both client and server side please. -- Cheers, Sverre Rabbelier ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-07 14:29 ` Sverre Rabbelier @ 2010-04-07 14:37 ` Vitaly 0 siblings, 0 replies; 39+ messages in thread From: Vitaly @ 2010-04-07 14:37 UTC (permalink / raw) To: Sverre Rabbelier Cc: Nicolas Pitre, Ilari Liusvaara, git, Shawn Pearce, Andreas Ericsson, Matthieu Moy On 04/07/2010 06:29 PM, Sverre Rabbelier wrote: > Heya, > > On Wed, Apr 7, 2010 at 09:08, Nicolas Pitre<nico@fluxnic.net> wrote: > >> This shouldn't happen since commit 92392b4a which provide a cap on >> memory usage during the delta resolution process. >> > Which made me think of asking: > > Vitaly, what version of git are you running? Both client and server side please. > git version 1.7.0.4, both sides. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-06 21:09 ` Nicolas Pitre 2010-04-07 5:54 ` Vitaly Berov @ 2010-04-07 5:55 ` Vitaly 2010-04-07 12:42 ` Nicolas Pitre 1 sibling, 1 reply; 39+ messages in thread From: Vitaly @ 2010-04-07 5:55 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Shawn Pearce, Andreas Ericsson, git, Matthieu Moy I suspected the security reasons. Ok, we work in trusted environment. How can we turn this behavior off? On 04/07/2010 01:09 AM, Nicolas Pitre wrote: > On Tue, 6 Apr 2010, Vitaly Berov wrote: > > >> Why does git compute checksums on the client side? Isn't it already >> calculated on the "server" side? >> > Yes. But Git clients can't trust the server like that. > > The only way to make sure the server didn't send you crap data, or worse > maliciously altered data, is actually to not transfer any checksum data > but to recompute and validate the received payload locally. > > This being said, you should never have to wait 6 hours for that phase to > complete. It is typically a matter of minutes if not seconds. > > > Nicolas > > ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-07 5:55 ` Vitaly @ 2010-04-07 12:42 ` Nicolas Pitre 0 siblings, 0 replies; 39+ messages in thread From: Nicolas Pitre @ 2010-04-07 12:42 UTC (permalink / raw) To: Vitaly; +Cc: Shawn Pearce, Andreas Ericsson, git, Matthieu Moy On Wed, 7 Apr 2010, Vitaly wrote: > I suspected the security reasons. > > Ok, we work in trusted environment. How can we turn this behavior off? you can't. This is fundamental to the Git native protocol. Nicolas ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-06 15:32 ` Andreas Ericsson [not found] ` <q2mec874dac1004060850r5eaa41fak2ba9889d07794651@mail.gmail.com> @ 2010-04-06 21:05 ` Nicolas Pitre 2010-04-07 9:22 ` git clone: very long "resolving deltas" phase Marat Radchenko 1 sibling, 1 reply; 39+ messages in thread From: Nicolas Pitre @ 2010-04-06 21:05 UTC (permalink / raw) To: Andreas Ericsson; +Cc: Vitaly, Matthieu Moy, git On Tue, 6 Apr 2010, Andreas Ericsson wrote: > On 04/06/2010 05:29 PM, Vitaly wrote: > > I didn't try this, but I'll give it a try, thanks. > > > > And what does this stage mean? > > > > It means the server is busy creating a packfile to send > over the wire. No. The "Resolving deltas" is performed locally, when Git is actually expanding all the deltas in the received pack to find the actual SHA1 of the resulting object in order to create the pack index. Nicolas ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-06 21:05 ` Nicolas Pitre @ 2010-04-07 9:22 ` Marat Radchenko 2010-04-07 14:40 ` Nicolas Pitre 0 siblings, 1 reply; 39+ messages in thread From: Marat Radchenko @ 2010-04-07 9:22 UTC (permalink / raw) To: git Nicolas Pitre <nico <at> fluxnic.net> writes: > The "Resolving deltas" is performed locally, when Git is actually > expanding all the deltas in the received pack to find the actual SHA1 of > the resulting object in order to create the pack index. Is there any technical limitation why it cannot be done simultaniously with fetch (piped or whatever), instead of a separate step after fetch? ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-07 9:22 ` git clone: very long "resolving deltas" phase Marat Radchenko @ 2010-04-07 14:40 ` Nicolas Pitre 0 siblings, 0 replies; 39+ messages in thread From: Nicolas Pitre @ 2010-04-07 14:40 UTC (permalink / raw) To: Marat Radchenko; +Cc: git On Wed, 7 Apr 2010, Marat Radchenko wrote: > Nicolas Pitre <nico <at> fluxnic.net> writes: > > The "Resolving deltas" is performed locally, when Git is actually > > expanding all the deltas in the received pack to find the actual SHA1 of > > the resulting object in order to create the pack index. > Is there any technical limitation why it cannot be done simultaniously with > fetch (piped or whatever), instead of a separate step after fetch? The non delta compressed objects are indexed simultaneously as they're received on the wire. However this is way suboptimal to do that for delta objects because 1) The base object needed to resolve a given delta object might not have been received yet. This means in this case that the delta will have to be resolved later anyway, and finding out if a just received object might be a base object for previously received objects is rather costly, and even impossible if that potential base object is itself a delta. So it is best to figure out the delta dependencies only once at the end of the transfer. 2) When resolving deep delta chains, it is best to start from the root i.e. create the result from a delta object and resolve all deltas with this result for base recursively, not to expand deltas repeatedly which would turn this process into exponential CPU usage. Again this can be done only when all delta objects have been received. Nicolas ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-06 15:01 ` Matthieu Moy 2010-04-06 15:28 ` Vitaly Berov 2010-04-06 15:29 ` Vitaly @ 2010-04-06 21:01 ` Nicolas Pitre 2 siblings, 0 replies; 39+ messages in thread From: Nicolas Pitre @ 2010-04-06 21:01 UTC (permalink / raw) To: Matthieu Moy; +Cc: Vitaly Berov, git On Tue, 6 Apr 2010, Matthieu Moy wrote: > Vitaly Berov <vitaly.berov@gmail.com> writes: > > > We have quite a large repository and "git clone" takes about 6 hours. Herewith > > "resolving deltas" takes most of the time. > > What git does at this stage and how can we optimize it? > > Does running "git gc" (long, but done once and for all) on the server > help? No, that won't help. Nicolas ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-06 14:18 git clone: very long "resolving deltas" phase Vitaly Berov 2010-04-06 15:01 ` Matthieu Moy @ 2010-04-06 21:10 ` Nicolas Pitre 2010-04-07 5:57 ` Vitaly 1 sibling, 1 reply; 39+ messages in thread From: Nicolas Pitre @ 2010-04-06 21:10 UTC (permalink / raw) To: Vitaly Berov; +Cc: git On Tue, 6 Apr 2010, Vitaly Berov wrote: > We have quite a large repository and "git clone" takes about 6 hours. Herewith > "resolving deltas" takes most of the time. This simply makes no sense. Is this repository publicly clonable? Nicolas ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-06 21:10 ` Nicolas Pitre @ 2010-04-07 5:57 ` Vitaly 2010-04-07 12:55 ` Nicolas Pitre 0 siblings, 1 reply; 39+ messages in thread From: Vitaly @ 2010-04-07 5:57 UTC (permalink / raw) To: Nicolas Pitre; +Cc: git Hmm, what does it mean - "makes no sense"? It works as it works - for several hours. No, we work in a trusted environment. Our repository isn't open for external people. On 04/07/2010 01:10 AM, Nicolas Pitre wrote: > On Tue, 6 Apr 2010, Vitaly Berov wrote: > > >> We have quite a large repository and "git clone" takes about 6 hours. Herewith >> "resolving deltas" takes most of the time. >> > This simply makes no sense. > > Is this repository publicly clonable? > > > Nicolas > > ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-07 5:57 ` Vitaly @ 2010-04-07 12:55 ` Nicolas Pitre 2010-04-09 6:50 ` Vitaly Berov 2010-04-10 13:25 ` Vitaly Berov 0 siblings, 2 replies; 39+ messages in thread From: Nicolas Pitre @ 2010-04-07 12:55 UTC (permalink / raw) To: Vitaly; +Cc: git On Wed, 7 Apr 2010, Vitaly wrote: > Hmm, what does it mean - "makes no sense"? It works as it works - for several > hours. As I said, several hours for this operation makes no sense. This should take minutes if no seconds. *This* is what needs fixing. > No, we work in a trusted environment. Our repository isn't open for external > people. I was asking that because that would have helped me (or any other Git developer) analyse the issue and provide a fix. OK then. What happens if you do the following on the server machine where the repository is stored: git repack -a -f -d How long does this take? How long does the "Resolving deltas" take when cloning this repacked repository? (don't wait more than 10 minutes for it). If the "Resolving deltas" takes more than 10 minutes, could you capture a strace dump from that process during a minute or so and post it here? Hmmm. Is this on Linux or Windows? Nicolas ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-07 12:55 ` Nicolas Pitre @ 2010-04-09 6:50 ` Vitaly Berov 2010-04-09 8:13 ` Matthieu Moy 2010-04-09 19:25 ` Nicolas Pitre 2010-04-10 13:25 ` Vitaly Berov 1 sibling, 2 replies; 39+ messages in thread From: Vitaly Berov @ 2010-04-09 6:50 UTC (permalink / raw) To: git; +Cc: git Hi, On 04/07/2010 04:55 PM, Nicolas Pitre wrote: > > I was asking that because that would have helped me (or any other Git > developer) analyse the issue and provide a fix. > > OK then. What happens if you do the following on the server machine > where the repository is stored: > > git repack -a -f -d > > How long does this take? > > How long does the "Resolving deltas" take when cloning this repacked > repository? (don't wait more than 10 minutes for it). Nicolas, we haven't stopped the process as you recommended, sorry for that. So, the results: it took 37 hours. 20 hours is compressing objects (delta compression using up to 4 threads), 17 hours is writing objects. Almost all of the time the bottleneck was a CPU. Objects amount: 3997548. Size of the repository: ~57Gb. > If the "Resolving deltas" takes more than 10 minutes, could you capture > a strace dump from that process during a minute or so and post it here? I'll capture strace later. > Hmmm. Is this on Linux or Windows? Short spec: Ubuntu 9.04 (64 bit), Intel(R) Core(TM)2 Quad CPU Q8400 2.66GHz, 8 GB of memory By the way, we have a large amount of binary files in our rep. Vitaly ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-09 6:50 ` Vitaly Berov @ 2010-04-09 8:13 ` Matthieu Moy 2010-04-09 19:18 ` Nicolas Pitre 2010-04-10 8:05 ` Vitaly Berov 2010-04-09 19:25 ` Nicolas Pitre 1 sibling, 2 replies; 39+ messages in thread From: Matthieu Moy @ 2010-04-09 8:13 UTC (permalink / raw) To: Vitaly Berov; +Cc: git Vitaly Berov <vitaly.berov@gmail.com> writes: > Objects amount: 3997548. > Size of the repository: ~57Gb. [...] > By the way, we have a large amount of binary files in our rep. This is clearly not the kind of repositories Git is good at. I encourage you to continue this discussion, and try to find a way to get it working, but the standard approach (probably a "my 2 cents" kind of advices, but ...) would be: * Split your repo into smaller ones (submodules ...) * Avoid versionning binary files -- Matthieu Moy http://www-verimag.imag.fr/~moy/ ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-09 8:13 ` Matthieu Moy @ 2010-04-09 19:18 ` Nicolas Pitre 2010-04-10 8:05 ` Vitaly Berov 1 sibling, 0 replies; 39+ messages in thread From: Nicolas Pitre @ 2010-04-09 19:18 UTC (permalink / raw) To: Matthieu Moy; +Cc: Vitaly Berov, git On Fri, 9 Apr 2010, Matthieu Moy wrote: > Vitaly Berov <vitaly.berov@gmail.com> writes: > > > Objects amount: 3997548. > > Size of the repository: ~57Gb. > [...] > > By the way, we have a large amount of binary files in our rep. > > This is clearly not the kind of repositories Git is good at. I > encourage you to continue this discussion, and try to find a way to > get it working, but the standard approach (probably a "my 2 cents" > kind of advices, but ...) would be: > > * Split your repo into smaller ones (submodules ...) > > * Avoid versionning binary files I still think that Git ought to "just work" with such a repository. There are things that should be done for that, like applying the core.bigFileThreshold configuration variable to more places, such as delta compression, object creation, diff generation, etc. Of course Git won't be as good at saving disk space in that case, but when your repo is 57GB you probably don't care much if it grows to 80GB but cloning it is twice as fast. Yet, I still don't think the current issue with the receiving end of a clone taking 6 hours in "Resolving deltas" is normal, independently of core.bigFileThreshold. Nicolas ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-09 8:13 ` Matthieu Moy 2010-04-09 19:18 ` Nicolas Pitre @ 2010-04-10 8:05 ` Vitaly Berov 1 sibling, 0 replies; 39+ messages in thread From: Vitaly Berov @ 2010-04-10 8:05 UTC (permalink / raw) To: git On 04/09/2010 12:13 PM, Matthieu Moy wrote: > Vitaly Berov<vitaly.berov@gmail.com> writes: > >> Objects amount: 3997548. >> Size of the repository: ~57Gb. > [...] >> By the way, we have a large amount of binary files in our rep. > > This is clearly not the kind of repositories Git is good at. Hmm.. I'm looking for a good repository because I'm tired of subversion, Perforce isn't an option to (very expensive and even more uncomfortable). It seems like there only Git/Mercurial are good options. Can you recommend some other scms? > I encourage you to continue this discussion, and try to find a way to > get it working, but the standard approach (probably a "my 2 cents" > kind of advices, but ...) would be: > > * Split your repo into smaller ones (submodules ...) > > * Avoid versionning binary files I can't get rid of binary files because they are the "sources" of our artists work (the develop a game). Splitting a repo can be an option, but it's very inconvenient for us. Vitaly ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-09 6:50 ` Vitaly Berov 2010-04-09 8:13 ` Matthieu Moy @ 2010-04-09 19:25 ` Nicolas Pitre 2010-04-10 7:58 ` Vitaly Berov 1 sibling, 1 reply; 39+ messages in thread From: Nicolas Pitre @ 2010-04-09 19:25 UTC (permalink / raw) To: Vitaly Berov; +Cc: git On Fri, 9 Apr 2010, Vitaly Berov wrote: > > OK then. What happens if you do the following on the server machine > > where the repository is stored: > > > > git repack -a -f -d > > > > How long does this take? > > So, the results: it took 37 hours. 20 hours is compressing objects (delta > compression using up to 4 threads), 17 hours is writing objects. Almost all of > the time the bottleneck was a CPU. > > Objects amount: 3997548. > Size of the repository: ~57Gb. OK. You probably have a size record. :-) How big is the .pack file in .git/objects/pack/ ? > By the way, we have a large amount of binary files in our rep. How many? How big? Nicolas ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-09 19:25 ` Nicolas Pitre @ 2010-04-10 7:58 ` Vitaly Berov 0 siblings, 0 replies; 39+ messages in thread From: Vitaly Berov @ 2010-04-10 7:58 UTC (permalink / raw) To: git On 04/09/2010 11:25 PM, Nicolas Pitre wrote: >> >> Objects amount: 3997548. >> Size of the repository: ~57Gb. > > OK. You probably have a size record. :-) That's game development. We have ~100 artists who produce text and binary files as "sources". FYI, the "end client version" is ~2.5GB. > How big is the .pack file in .git/objects/pack/ ? ~56Gb > >> By the way, we have a large amount of binary files in our rep. > > How many? How big? Total amount of files ~400000, amount of binaries ~200000. Distribution of sizes: 5% of 4M - 32K, 5% of 32K - 12K, 12K - 6K, 6K - 4K, 4K - 2.5K, 2.5K - 2.3K, 2.3K - 2K, the rest Vitaly P.S. By the way, msysgit can't handle this repository, blocker bug is: http://code.google.com/p/msysgit/issues/detail?id=365&q=mmap&colspec=ID%20Type%20Status%20Priority%20Component%20Owner%20Summary. So I thinking about stopping the evaluation, though I like git (especially after a long subversion experience :)) ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-07 12:55 ` Nicolas Pitre 2010-04-09 6:50 ` Vitaly Berov @ 2010-04-10 13:25 ` Vitaly Berov 2010-04-11 0:50 ` Nicolas Pitre 1 sibling, 1 reply; 39+ messages in thread From: Vitaly Berov @ 2010-04-10 13:25 UTC (permalink / raw) To: git Hi, On 04/07/2010 04:55 PM, Nicolas Pitre wrote: > On Wed, 7 Apr 2010, Vitaly wrote: > > > OK then. What happens if you do the following on the server machine > where the repository is stored: > > git repack -a -f -d > > How long does this take? > > If the "Resolving deltas" takes more than 10 minutes, could you capture > a strace dump from that process during a minute or so and post it here? Nicolas, I took strace and sent it to you personally. Here is the extract (99% of strace is the same): -------------- access("/home/vitaly/Projects/test/a1/.git/objects/0f/9a3d28766f8b767fb64166139dd65c079512de", F_OK) = -1 ENOENT (No such file or directory) pread(5, "x\234\324\275y\\Ni\374\377\177\256\323]\335Q\271S\332\220\"\n\241\10Q\10!$!d/\262"..., 214850, 8944159649) = 214850 access("/home/vitaly/Projects/test/a1/.git/objects/a5/5430cbc6674b56d7c2d2d81ef5b7d5c8ebdec8", F_OK) = -1 ENOENT (No such file or directory) pread(5, "x\234\354\275\vT\224U\0270<\347\231\v\363\250\244#\f0 \"\"\"\312ETD\300af"..., 159502, 8944374506) = 159502 access("/home/vitaly/Projects/test/a1/.git/objects/e5/02b7d050d1b81ebc256234e303eac17116c9fb", F_OK) = -1 ENOENT (No such file or directory) pread(5, "x\234\324\274yX\24G\3607>\3353\263,\310\342\"7\2.\202\342\1\10\212\210\212\10\236x\341"..., 61131, 8944534014) = 61131 access("/home/vitaly/Projects/test/a1/.git/objects/5b/6bdba61771e5ba63ba8b43659db1612345c2eb", F_OK) = -1 ENOENT (No such file or directory) pread(5, "x\234\324\275\tX\216Y\3747~\237\323\366DOJiyR\236\210B*$!\311\276\223=[\n"..., 236685, 8944595152) = 236685 ----------------- As for me, it looks very suspicious. Vitaly ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-10 13:25 ` Vitaly Berov @ 2010-04-11 0:50 ` Nicolas Pitre 2010-04-12 15:31 ` Vitaly 0 siblings, 1 reply; 39+ messages in thread From: Nicolas Pitre @ 2010-04-11 0:50 UTC (permalink / raw) To: Vitaly Berov; +Cc: git On Sat, 10 Apr 2010, Vitaly Berov wrote: > Hi, > > On 04/07/2010 04:55 PM, Nicolas Pitre wrote: > > On Wed, 7 Apr 2010, Vitaly wrote: > > > > > > OK then. What happens if you do the following on the server machine > > where the repository is stored: > > > > git repack -a -f -d > > > > How long does this take? > > > > If the "Resolving deltas" takes more than 10 minutes, could you capture > > a strace dump from that process during a minute or so and post it here? > > Nicolas, I took strace and sent it to you personally. > > Here is the extract (99% of strace is the same): > -------------- > access("/home/vitaly/Projects/test/a1/.git/objects/0f/9a3d28766f8b767fb64166139dd65c079512de", F_OK) = -1 ENOENT (No such file or directory) > pread(5, "x\234\324\275y\\Ni\374\377\177\256\323]\335Q\271S\332\220\"\n\241\10Q\10!$!d/\262"..., 214850, 8944159649) = 214850 > access("/home/vitaly/Projects/test/a1/.git/objects/a5/5430cbc6674b56d7c2d2d81ef5b7d5c8ebdec8", F_OK) = -1 ENOENT (No such file or directory) > pread(5, "x\234\354\275\vT\224U\0270<\347\231\v\363\250\244#\f0\"\"\"\312ETD\300af"..., 159502, 8944374506) = 159502 > access("/home/vitaly/Projects/test/a1/.git/objects/e5/02b7d050d1b81ebc256234e303eac17116c9fb", F_OK) = -1 ENOENT (No such file or directory) > pread(5, "x\234\324\274yX\24G\3607>\3353\263,\310\342\"7\2.\202\342\1\10\212\210\212\10\236x\341"..., 61131, 8944534014) = 61131 > access("/home/vitaly/Projects/test/a1/.git/objects/5b/6bdba61771e5ba63ba8b43659db1612345c2eb", F_OK) = -1 ENOENT (No such file or directory) > pread(5, "x\234\324\275\tX\216Y\3747~\237\323\366DOJiyR\236\210B*$!\311\276\223=[\n"..., 236685, 8944595152) = 236685 > ----------------- > As for me, it looks very suspicious. It isn't. The pread() is performed for each delta object within the received pack to be resolved, and then the access() is performed to make sure the resolved delta doesn't match an object in loose form with the same hash. Of course deltas are recursive, meaning that a delta might refer to a base object which is itself a delta, and so on. And yet a base object might have many delta objects referring to it. So without a smart delta resolution ordering and caching, we'd end up with an exponential number of pread calls. However the cache size is limited to avoid memory exhaustion with deep and wide delta trees (that's the core.deltaBaseCacheLimit config variable). So from that strace capture you sent me, we can get: $ grep access strace.txt | wc -l 3925 $ grep pread strace.txt | wc -l 4095 $ grep pread strace.txt | sort | uniq -d | wc -l 75 So, given 3925 deltas to process, only 4095 objects were read, which is not too bad. Still, 75 of them were read more than once, which means they were evicted from the cache while they were still needed. The core.deltaBaseCacheLimit could be increased to avoid those 75 duplicates. Let's have a look at a few of them: $ grep pread strace.txt | sort | uniq -cd | sort -nr 20 pread(5, "x\234\354\275w\234\34\305\3210<;\263;\233\357nf\357v/I{\312\243\333=\235t\247p'"..., 1265653, 504922895) = 1265653 20 pread(5, "x\234\254}\7|\24E\373\377\315\354f\357r\227v\227\344.\275A`\271\\\200\20:\204^\245\203"..., 264956, 506188555) = 264956 6 pread(5, "x\234\274}\7xT\305\366\370\336\231\335\273\273\251\354f\263\233\36:\227d\3\201@ \224PB\257"..., 253102, 49016172335) = 253102 6 pread(5, "x\234\274\275\7\224\34\305\3618<;\263;\263\341\322\356\336\355^\222N\361\30\335\355)\235\20w\2"..., 506683, 48982212429) = 506683 6 pread(5, "x\234\254}\7x\34\305\25\360\336\336\335^Q\275;\351N\262d\313M\362\372tr\2231\226\1\27"..., 402609, 49245906707) = 402609 6 pread(5, "x\234\254\275\t|\24\305\3628\2763\323;\273I6\t\331lvs\21B\270\206\315&\1\2H@"..., 176754, 49246749832) = 176754 6 pread(5, "x\234\234}\7|T\305\366\377\336\331\315\246P\23H\205$t\226$t)\t\322\244\367\5\5\5\244"..., 236257, 49246513568) = 236257 6 pread(5, "x\234\224}\7|TE\360\377\355\356\225t\270\224K\207$@r\244\2\241\205\320{\21\10 \35\351"..., 204238, 49246309323) = 204238 5 pread(5, "x\234\264\275\7xT\305\0277\274\367\316\335\273%u7\311n*!\204rI6\t$\20\10\275\211"..., 233622, 49247108828) = 233622 5 pread(5, "x\234\254\275\7|T\305\363\0~\357\275\275w\227\334%\341\222\334]zB \341\270\\\n\204\26:"..., 182228, 49246926593) = 182228 5 pread(5, "x\234\234\274\5x\24\327\32\377\177\316\314fvf\26\22\226@\2\4\222l\4\"\33 \4\v\20\10"..., 70234, 49247342456) = 70234 4 pread(5, "x\234m{\t\\TU\373\377\3343\\\34\206a\270s/\273\10\303* \340 \240\250\250\203\"\232"..., 9345, 49425395631) = 9345 4 pread(5, "x\234-\326\177P\323e\34\300\3619`\342T\4\324)\23\1'\242\241\233c\300D:)~\232\250"..., 1211, 49248626398) = 1211 4 pread(5, "x\234\314\275w\234\24E\363\7\274\323\263;\273\227o\367\356v\357\270\304\35\341\3662p\204;r\16"..., 149400, 49425246225) = 149400 4 pread(5, "x\234\274\275\7|\34\305\3658\276\345n\257J\362\355Iw\262e[r_K'7al\31\343\2"..., 549072, 38602368468) = 549072 So... the first two objects are clearly a problem as they are re-loaded over and over. Given that their offset is far away from the others i.e. relatively at the beginning of the pack, they probably are quite high in their delta hierarchy. And what's really bad is to see those at the beginning of 10 pread() calls in a row meaning that an entire delta string has to be replayed in order to get back all those base objects that were evicted from the cache. That's clearly wasted CPU cycles and that shouldn't happen with a large enough value for core.deltaBaseCacheLimit. Given that your files are "relatively" small i.e. in the 4MB range max, then the cache should be able to hold quite many of them. At the moment with its 16MB limit, only a few of those objects would evict many objects from the cache quickly. If this is still not good enough, then you could add a negative delta attribute to those large binary files (see http://www.kernel.org/pub/software/scm/git/docs/gitattributes.html) and repack the repository on the server. Of course that will make the repository larger and the data transfer longer when cloning, but the "resolving deltas" will be much faster. This is therefore a tradeoff. Another solution which might be way more practical for users of such a huge repository is simply to use a shallow clone. Surely those people cloning this repository might not need the full history of the repository. So you could simply use: git clone --depth=10 ... and have only the last 10 revisions transferred. Later on the repository can be deepened by passing the --depth argument with a larger value to the fetch command if need be. Nicolas ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: git clone: very long "resolving deltas" phase 2010-04-11 0:50 ` Nicolas Pitre @ 2010-04-12 15:31 ` Vitaly 0 siblings, 0 replies; 39+ messages in thread From: Vitaly @ 2010-04-12 15:31 UTC (permalink / raw) To: git Hello, On 04/11/2010 04:50 AM, Nicolas Pitre wrote: > core.deltaBaseCacheLimit. Given that your files are "relatively" small > i.e. in the 4MB range max, then the cache should be able to hold quite > many of them. At the moment with its 16MB limit, only a few of those > objects would evict many objects from the cache quickly. > > If this is still not good enough, then you could add a negative delta > attribute to those large binary files (see > http://www.kernel.org/pub/software/scm/git/docs/gitattributes.html) > and repack the repository on the server. Of course that will make the > repository larger and the data transfer longer when cloning, but the > "resolving deltas" will be much faster. This is therefore a tradeoff. > > Another solution which might be way more practical for users of such a > huge repository is simply to use a shallow clone. Surely those people > cloning this repository might not need the full history of the > repository. So you could simply use: > > git clone --depth=10 ... > > and have only the last 10 revisions transferred. Later on the > repository can be deepened by passing the --depth argument with a larger > value to the fetch command if need be. > > > Nicolas > > Thanks for comprehensive answer, Nicolas. Now I see 3 directions to work on: cacheLimit, negative delta attributes and shortening the history (actually, I don't think "clone --depth" is feasible in our environment, but we can try to backup and just purge the history). ^ permalink raw reply [flat|nested] 39+ messages in thread
end of thread, other threads:[~2010-04-12 15:32 UTC | newest] Thread overview: 39+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-04-06 14:18 git clone: very long "resolving deltas" phase Vitaly Berov 2010-04-06 15:01 ` Matthieu Moy 2010-04-06 15:28 ` Vitaly Berov 2010-04-06 15:29 ` Vitaly 2010-04-06 15:32 ` Andreas Ericsson [not found] ` <q2mec874dac1004060850r5eaa41fak2ba9889d07794651@mail.gmail.com> 2010-04-06 15:56 ` Vitaly Berov 2010-04-06 21:09 ` Nicolas Pitre 2010-04-07 5:54 ` Vitaly Berov 2010-04-07 8:00 ` Ilari Liusvaara 2010-04-07 8:14 ` Vitaly 2010-04-07 9:00 ` Ilari Liusvaara 2010-04-07 9:37 ` Jakub Narebski 2010-04-07 14:20 ` Nicolas Pitre 2010-04-07 14:35 ` Vitaly 2010-04-07 14:55 ` Nicolas Pitre 2010-04-09 6:46 ` Vitaly Berov 2010-04-09 19:30 ` Nicolas Pitre 2010-04-10 6:32 ` Vitaly Berov 2010-04-07 14:08 ` Nicolas Pitre 2010-04-07 14:29 ` Sverre Rabbelier 2010-04-07 14:37 ` Vitaly 2010-04-07 5:55 ` Vitaly 2010-04-07 12:42 ` Nicolas Pitre 2010-04-06 21:05 ` Nicolas Pitre 2010-04-07 9:22 ` git clone: very long "resolving deltas" phase Marat Radchenko 2010-04-07 14:40 ` Nicolas Pitre 2010-04-06 21:01 ` git clone: very long "resolving deltas" phase Nicolas Pitre 2010-04-06 21:10 ` Nicolas Pitre 2010-04-07 5:57 ` Vitaly 2010-04-07 12:55 ` Nicolas Pitre 2010-04-09 6:50 ` Vitaly Berov 2010-04-09 8:13 ` Matthieu Moy 2010-04-09 19:18 ` Nicolas Pitre 2010-04-10 8:05 ` Vitaly Berov 2010-04-09 19:25 ` Nicolas Pitre 2010-04-10 7:58 ` Vitaly Berov 2010-04-10 13:25 ` Vitaly Berov 2010-04-11 0:50 ` Nicolas Pitre 2010-04-12 15:31 ` Vitaly
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).