* git-fetch per-repository speed issues
@ 2006-07-03 18:02 Keith Packard
2006-07-03 23:14 ` Linus Torvalds
` (2 more replies)
0 siblings, 3 replies; 30+ messages in thread
From: Keith Packard @ 2006-07-03 18:02 UTC (permalink / raw)
To: Git Mailing List; +Cc: keithp
[-- Attachment #1: Type: text/plain, Size: 1237 bytes --]
Ok, so maybe X.org is using git in an unexpected (or even wrong)
fashion. Our environment has split development across dozens of separate
repositories which match ABI interfaces. With CVS, we were able to keep
this all in one giant CVS repository with separate modules, but git
doesn't have that notion (which is mostly good). As such, we could use
cvsup or rsync to update the entire collection of modules.
With git, we'd prefer to use the git protocol instead of rsync for the
usual pack-related reasons, but that is limited to a single repository
at a time. And, it's painfully slow, even when the repository is up to
date:
$ cd lib/libXrandr
$ time git-fetch origin
...
real 0m17.035s
user 0m2.584s
sys 0m0.576s
This is a repository with 24 files and perhaps 50 revisions. Given
X.org's 307 git repositories, I'll clearly need to find a faster way
than running git-fetch on every one.
One thing I noticed was that the git+ssh syntax found in remotes files
doesn't do what I thought it did -- I assumed this would use 'git' for
fetch and 'ssh' for push, when in fact it just uses ssh for everything.
This slows down the connection process by several seconds.
--
keith.packard@intel.com
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: git-fetch per-repository speed issues 2006-07-03 18:02 git-fetch per-repository speed issues Keith Packard @ 2006-07-03 23:14 ` Linus Torvalds 2006-07-04 0:21 ` Jeff King [not found] ` <1151973438.4723.70.camel@neko.keithp.com> 2006-07-04 15:42 ` Jakub Narebski 2006-07-06 23:36 ` David Woodhouse 2 siblings, 2 replies; 30+ messages in thread From: Linus Torvalds @ 2006-07-03 23:14 UTC (permalink / raw) To: Keith Packard; +Cc: Git Mailing List On Mon, 3 Jul 2006, Keith Packard wrote: > > With git, we'd prefer to use the git protocol instead of rsync for the > usual pack-related reasons, but that is limited to a single repository > at a time. Well, you could use multiple branches in the same repository, even if they are totally unrealated. That would allow you to fetch them all in one go. One way to do that is to just name the branches hierarcially have one repo, but then call the branches something like libXrandr/master libXrandr/develop Xorg/master Xorg/develop .. > And, it's painfully slow, even when the repository is up to > date: > > $ cd lib/libXrandr > $ time git-fetch origin > ... > > real 0m17.035s > user 0m2.584s > sys 0m0.576s That's _seriously_ wrong. If everything is up-to-date, a fetch should be basically zero-cost. That's especially true with the anonymous git protocol, which doesn't have any connection validation overhead (for the ssh protocol, the cost is usually the ssh login). But there may well be some bug there. Look at this: [torvalds@g5 git]$ time git fetch git://git.kernel.org/pub/scm/git/git.git real 0m0.431s user 0m0.036s sys 0m0.024s and that's over my DSL line, not some studly network thing. Basically, a repo that is up-to-date should do a "git fetch" about as quickly as it does a "git ls-remote". Which in turn really shouldn't be doing much anything at all, apart from the connect itself: [torvalds@g5 git]$ time git ls-remote master.kernel.org:/pub/scm/git/git.git > /dev/null real 0m1.758s user 0m0.188s sys 0m0.024s [torvalds@g5 git]$ time git ls-remote git://git.kernel.org/pub/scm/git/git.git > /dev/null real 0m0.431s user 0m0.056s sys 0m0.016s (note how the ssh connection is much slower - it actually ends up doing all the ssh back-and-forth). Can you try from different hosts? One problem may be the remote end just trying to do reverse DNS lookups for xinetd or whatever? Also, one thing to try is to just do strace -Ttt git-peek-remote ... which shows where the time is going (I selected "git-peek-remote", because that's a simple program). Linus ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues 2006-07-03 23:14 ` Linus Torvalds @ 2006-07-04 0:21 ` Jeff King 2006-07-04 1:22 ` Ryan Anderson ` (2 more replies) [not found] ` <1151973438.4723.70.camel@neko.keithp.com> 1 sibling, 3 replies; 30+ messages in thread From: Jeff King @ 2006-07-04 0:21 UTC (permalink / raw) To: Linus Torvalds; +Cc: Keith Packard, Git Mailing List On Mon, Jul 03, 2006 at 04:14:10PM -0700, Linus Torvalds wrote: > Well, you could use multiple branches in the same repository, even if they > are totally unrealated. That would allow you to fetch them all in one go. One annoying thing about this is that you may want to have several of the branches checked out at a time (i.e., you want the actual directory structure of libXrandr/, Xorg/, etc). You could pull everything down into one repo and point small pseudo-repos at it with alternates, but I would think that would become a mess with pushes. You can do some magic with read-tree --prefix, but again, I'm not sure how you'd make commits on the correct branch. Is there an easier way to do this? > Basically, a repo that is up-to-date should do a "git fetch" about as > quickly as it does a "git ls-remote". Which in turn really shouldn't be > doing much anything at all, apart from the connect itself: Fetching by ssh actually makes two ssh connections (the second is to grab tags). -Peff ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues 2006-07-04 0:21 ` Jeff King @ 2006-07-04 1:22 ` Ryan Anderson 2006-07-04 1:44 ` Jeff King 2006-07-04 3:07 ` Linus Torvalds 2006-07-04 6:44 ` Jakub Narebski 2 siblings, 1 reply; 30+ messages in thread From: Ryan Anderson @ 2006-07-04 1:22 UTC (permalink / raw) To: Jeff King; +Cc: Linus Torvalds, Keith Packard, Git Mailing List [-- Attachment #1: Type: text/plain, Size: 1158 bytes --] Jeff King wrote: > On Mon, Jul 03, 2006 at 04:14:10PM -0700, Linus Torvalds wrote: > > >> Well, you could use multiple branches in the same repository, even if they >> are totally unrealated. That would allow you to fetch them all in one go. >> > > One annoying thing about this is that you may want to have several of > the branches checked out at a time (i.e., you want the actual directory > structure of libXrandr/, Xorg/, etc). You could pull everything down > into one repo and point small pseudo-repos at it with alternates, but I > would think that would become a mess with pushes. You can do some magic > with read-tree --prefix, but again, I'm not sure how you'd make commits > on the correct branch. Is there an easier way to do this? > You can have multiple source trees, one per 'branch' (which is a bit of a bad term here), and have completely unrelated things in the branches. See, for an example, the main Git repo, which has the "man", "html", and "todo" branches, logically distinct and (somewhat) unrelated to the main branch tucked away in "master". -- Ryan Anderson sometimes Pug Majere [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 252 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues 2006-07-04 1:22 ` Ryan Anderson @ 2006-07-04 1:44 ` Jeff King 2006-07-04 1:55 ` Ryan Anderson 0 siblings, 1 reply; 30+ messages in thread From: Jeff King @ 2006-07-04 1:44 UTC (permalink / raw) To: Ryan Anderson; +Cc: Linus Torvalds, Keith Packard, Git Mailing List On Mon, Jul 03, 2006 at 06:22:26PM -0700, Ryan Anderson wrote: > You can have multiple source trees, one per 'branch' (which is a bit of > a bad term here), and have completely unrelated things in the branches. > > See, for an example, the main Git repo, which has the "man", "html", and > "todo" branches, logically distinct and (somewhat) unrelated to the main > branch tucked away in "master". Right, I know, but my complaint is that I can't then turn that into a directory hierarchy of .../man, .../html, .../todo that are all checked out at the same time (there are obviously ways of playing with it, say by setting GIT_DIR and doing a checkout in those directories, but then I can't use git in the normal way). The best I can come up with is having man, html, and todo repos pointing to the one (now local) repo which contains everything. But then pushing is a two-step process. -Peff ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues 2006-07-04 1:44 ` Jeff King @ 2006-07-04 1:55 ` Ryan Anderson 0 siblings, 0 replies; 30+ messages in thread From: Ryan Anderson @ 2006-07-04 1:55 UTC (permalink / raw) To: Jeff King; +Cc: Linus Torvalds, Keith Packard, Git Mailing List [-- Attachment #1: Type: text/plain, Size: 1406 bytes --] Jeff King wrote: > On Mon, Jul 03, 2006 at 06:22:26PM -0700, Ryan Anderson wrote: > > >> You can have multiple source trees, one per 'branch' (which is a bit of >> a bad term here), and have completely unrelated things in the branches. >> >> See, for an example, the main Git repo, which has the "man", "html", and >> "todo" branches, logically distinct and (somewhat) unrelated to the main >> branch tucked away in "master". >> > > Right, I know, but my complaint is that I can't then turn that into a > directory hierarchy of .../man, .../html, .../todo that are all checked > out at the same time (there are obviously ways of playing with it, say > by setting GIT_DIR and doing a checkout in those directories, but then I > can't use git in the normal way). > > The best I can come up with is having man, html, and todo repos pointing > to the one (now local) repo which contains everything. But then pushing > is a two-step process. > > Hrm, if I understand CVS at all, the old workflow was "cvsup a copy of the repository, update a working tree against that", which is, I think, actually even worse than the Git equivalent, since you can't reliably even commit to that local clone of the CVS repository. What am I missing? You can still push directly upstream, I suppose, and just do 2-stage pulls down. -- Ryan Anderson sometimes Pug Majere [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 252 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues 2006-07-04 0:21 ` Jeff King 2006-07-04 1:22 ` Ryan Anderson @ 2006-07-04 3:07 ` Linus Torvalds 2006-07-05 6:47 ` Jeff King 2006-07-04 6:44 ` Jakub Narebski 2 siblings, 1 reply; 30+ messages in thread From: Linus Torvalds @ 2006-07-04 3:07 UTC (permalink / raw) To: Jeff King; +Cc: Keith Packard, Git Mailing List On Mon, 3 Jul 2006, Jeff King wrote: > > Fetching by ssh actually makes two ssh connections (the second is to > grab tags). True. Although that should happen only if there are any new tags. Linus ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues 2006-07-04 3:07 ` Linus Torvalds @ 2006-07-05 6:47 ` Jeff King 2006-07-05 16:40 ` Linus Torvalds 0 siblings, 1 reply; 30+ messages in thread From: Jeff King @ 2006-07-05 6:47 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List On Mon, Jul 03, 2006 at 08:07:49PM -0700, Linus Torvalds wrote: > > Fetching by ssh actually makes two ssh connections (the second is to > > grab tags). > True. Although that should happen only if there are any new tags. Either you're wrong or there's a bug in git-fetch. I think you're missing the call to git-ls-remote --tags to get the list of tags (which we will then auto-follow if necessary). So in that case, there would actually be 3 ssh connections. If everything is up to date, we still make 2 connections (one to check refs from remotes file, and one to check remote tag list). -Peff ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues 2006-07-05 6:47 ` Jeff King @ 2006-07-05 16:40 ` Linus Torvalds 0 siblings, 0 replies; 30+ messages in thread From: Linus Torvalds @ 2006-07-05 16:40 UTC (permalink / raw) To: Jeff King; +Cc: Git Mailing List On Wed, 5 Jul 2006, Jeff King wrote: > > Either you're wrong or there's a bug in git-fetch. I was wrong - I forgot the git-ls-remote (which really should be unnecessary, but the way the git-fetch-pack works, we end up re-connecting). Linus ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues 2006-07-04 0:21 ` Jeff King 2006-07-04 1:22 ` Ryan Anderson 2006-07-04 3:07 ` Linus Torvalds @ 2006-07-04 6:44 ` Jakub Narebski 2 siblings, 0 replies; 30+ messages in thread From: Jakub Narebski @ 2006-07-04 6:44 UTC (permalink / raw) To: git Jeff King wrote: > On Mon, Jul 03, 2006 at 04:14:10PM -0700, Linus Torvalds wrote: > >> Well, you could use multiple branches in the same repository, even if they >> are totally unrealated. That would allow you to fetch them all in one go. > > One annoying thing about this is that you may want to have several of > the branches checked out at a time (i.e., you want the actual directory > structure of libXrandr/, Xorg/, etc). You could pull everything down > into one repo and point small pseudo-repos at it with alternates, but I > would think that would become a mess with pushes. You can do some magic > with read-tree --prefix, but again, I'm not sure how you'd make commits > on the correct branch. Is there an easier way to do this? Write proper subprojects support for git, or pester someone to write it (finally). See Subpro.txt in todo branch. -- Jakub Narebski Warsaw, Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 30+ messages in thread
[parent not found: <1151973438.4723.70.camel@neko.keithp.com>]
* Re: git-fetch per-repository speed issues [not found] ` <1151973438.4723.70.camel@neko.keithp.com> @ 2006-07-04 3:21 ` Linus Torvalds 2006-07-04 3:30 ` Junio C Hamano 2006-07-04 4:02 ` Keith Packard 0 siblings, 2 replies; 30+ messages in thread From: Linus Torvalds @ 2006-07-04 3:21 UTC (permalink / raw) To: Keith Packard; +Cc: Git Mailing List, Junio C Hamano On Mon, 3 Jul 2006, Keith Packard wrote: > On Mon, 2006-07-03 at 16:14 -0700, Linus Torvalds wrote: > > > > Well, you could use multiple branches in the same repository, even if they > > are totally unrealated. That would allow you to fetch them all in one go. > > I'd like to avoid this; the hope is that most people won't ever need to > look at most repositories; it would be somewhat like having glibc in the > same repo as the kernel... Sure, understood. I'm just saying that if you want to fetch in one go, it's one possibility. However, your setup has something else seriously wrong. > Yeah, I tried with the git protocol and it's a few seconds faster (about > 14 seconds instead of 17). Ick. That's -still- about 13 seconds too much. > I think it might have something to do with the number of heads we're > tracking. It really shouldn't matter. You get all the heads in one go with a single connection, so if 32 heads takes 32 times longer, there's something wrong. > > Also, one thing to try is to just do > > > > strace -Ttt git-peek-remote ... > > That's plenty fast, 0.410 seconds, with nothing ugly in the strace. Ok, a "git fetch" really shouldn't take any longer than a single connection. However, the fact that you have 32 heads, and it takes pretty close to _exactly_ 32 times 0.410 seconds (32*0.410s = 13.1s) makes me suspect that "git fetch" is just broken and fetches one branch at a time. Which would be just stupid. But look as I might, I see only that one "git-fetch-pack" in git-fetch.sh that should trigger. Once. Not 32 times. But your timings sure sound like it's doing a _lot_ more than it should. Junio, any ideas? Keithp, can you try this trivial patch? It _should_ say something like Fetching refs/heads/master refs/heads/... refs/heads/... ... refs/heads/... from git://..../... and more importantly, it should say so only once. And then it should leave a "fetch.trace" file in your working directory, which should show where that _one_ thing spends its time. Linus ---- diff --git a/git-fetch.sh b/git-fetch.sh index 48818f8..4739202 100755 --- a/git-fetch.sh +++ b/git-fetch.sh @@ -339,6 +339,8 @@ fetch_main () { ( : subshell because we muck with IFS IFS=" $LF" ( + echo "Fetching $rref from $remote" >&2 + strace -o fetch.trace -Ttt \ git-fetch-pack $exec $keep --thin "$remote" $rref || echo failed "$remote" ) | while read sha1 remote_name ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues 2006-07-04 3:21 ` Linus Torvalds @ 2006-07-04 3:30 ` Junio C Hamano 2006-07-04 3:40 ` Linus Torvalds 2006-07-04 4:02 ` Keith Packard 1 sibling, 1 reply; 30+ messages in thread From: Junio C Hamano @ 2006-07-04 3:30 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Linus Torvalds <torvalds@osdl.org> writes: > Ok, a "git fetch" really shouldn't take any longer than a single > connection. However, the fact that you have 32 heads, and it takes pretty > close to _exactly_ 32 times 0.410 seconds (32*0.410s = 13.1s) makes me > suspect that "git fetch" is just broken and fetches one branch at a time. > > Which would be just stupid. > > But look as I might, I see only that one "git-fetch-pack" in git-fetch.sh > that should trigger. Once. Not 32 times. But your timings sure sound like > it's doing a _lot_ more than it should. > > Junio, any ideas? Isn't that because the repository have 32 subprojects, totally unrelated content-wise? If you have real stuff to pull from there your pack generation needs to do 32 time as much work as you would for a single head in that case. If you are discussing "peek-remote runs, find out the 32 heads are all up to date and no pack is generated" case, then you are right. There is one single fetch-pack to grab the specified heads, and after that, an optional single ls-remote and fetch-pack runs only once to follow all new tags. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues 2006-07-04 3:30 ` Junio C Hamano @ 2006-07-04 3:40 ` Linus Torvalds 2006-07-04 4:30 ` Keith Packard 0 siblings, 1 reply; 30+ messages in thread From: Linus Torvalds @ 2006-07-04 3:40 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Mon, 3 Jul 2006, Junio C Hamano wrote: > > Isn't that because the repository have 32 subprojects, totally > unrelated content-wise? If you have real stuff to pull from > there your pack generation needs to do 32 time as much work as > you would for a single head in that case. No, Keith said this was for the case where the fetching repository is already totally up-to-date: "And, it's painfully slow, even when the repository is up to date" and gave a 17-second time. Linus ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues 2006-07-04 3:40 ` Linus Torvalds @ 2006-07-04 4:30 ` Keith Packard 2006-07-04 11:10 ` Andreas Ericsson 0 siblings, 1 reply; 30+ messages in thread From: Keith Packard @ 2006-07-04 4:30 UTC (permalink / raw) To: Linus Torvalds; +Cc: keithp, Junio C Hamano, git [-- Attachment #1: Type: text/plain, Size: 2093 bytes --] On Mon, 2006-07-03 at 20:40 -0700, Linus Torvalds wrote: > "And, it's painfully slow, even when the repository is up to date" > > and gave a 17-second time. It's faster this evening, down to 8 seconds using ssh and 4 seconds using git. I clearly need to force use of the git protocol. Anyone else like the attached patch? --- connect.c | 18 ++++++++++++++---- 1 files changed, 14 insertions(+), 4 deletions(-) diff --git a/connect.c b/connect.c index 9a87bd9..e74eddc 100644 --- a/connect.c +++ b/connect.c @@ -303,6 +303,7 @@ enum protocol { PROTO_LOCAL = 1, PROTO_SSH, PROTO_GIT, + PROTO_GIT_SSH, }; static enum protocol get_protocol(const char *name) @@ -312,9 +313,9 @@ static enum protocol get_protocol(const if (!strcmp(name, "git")) return PROTO_GIT; if (!strcmp(name, "git+ssh")) - return PROTO_SSH; + return PROTO_GIT_SSH; if (!strcmp(name, "ssh+git")) - return PROTO_SSH; + return PROTO_GIT_SSH; die("I don't handle protocol '%s'", name); } @@ -572,6 +573,14 @@ static void git_proxy_connect(int fd[2], close(pipefd[1][0]); } +/* returns whether the specified command can be interpreted by the daemon */ +int git_is_daemon_command (const char *prog) +{ + if (!strcmp("git-upload-pack", prog)) + return 1; + return 0; +} + /* * Yeah, yeah, fixme. Need to pass in the heads etc. */ @@ -641,7 +650,8 @@ int git_connect(int fd[2], char *url, co *ptr = '\0'; } - if (protocol == PROTO_GIT) { + if (protocol == PROTO_GIT || + (protocol == PROTO_GIT_SSH && git_is_daemon_command (prog))) { /* These underlying connection commands die() if they * cannot connect. */ @@ -678,7 +688,7 @@ int git_connect(int fd[2], char *url, co close(pipefd[0][1]); close(pipefd[1][0]); close(pipefd[1][1]); - if (protocol == PROTO_SSH) { + if (protocol == PROTO_SSH || protocol == PROTO_GIT_SSH) { const char *ssh, *ssh_basename; ssh = getenv("GIT_SSH"); if (!ssh) ssh = "ssh"; -- 1.4.1.g8fced-dirty -- keith.packard@intel.com [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues 2006-07-04 4:30 ` Keith Packard @ 2006-07-04 11:10 ` Andreas Ericsson 2006-07-04 11:18 ` Matthias Kestenholz 0 siblings, 1 reply; 30+ messages in thread From: Andreas Ericsson @ 2006-07-04 11:10 UTC (permalink / raw) To: Keith Packard; +Cc: Linus Torvalds, Junio C Hamano, git Keith Packard wrote: > On Mon, 2006-07-03 at 20:40 -0700, Linus Torvalds wrote: > > >> "And, it's painfully slow, even when the repository is up to date" >> >>and gave a 17-second time. > > > It's faster this evening, down to 8 seconds using ssh and 4 seconds > using git. I clearly need to force use of the git protocol. Anyone else > like the attached patch? Since it changes the current meaning of ssh+git, I'm not exactly thrilled. However, "git/ssh" or "ssh/git" would work fine for me. The slash-separator could be used to say "fetch over this, push over that", so we can end up with any valid protocol to use for fetches and another one to push over. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues 2006-07-04 11:10 ` Andreas Ericsson @ 2006-07-04 11:18 ` Matthias Kestenholz 2006-07-04 12:05 ` Andreas Ericsson 0 siblings, 1 reply; 30+ messages in thread From: Matthias Kestenholz @ 2006-07-04 11:18 UTC (permalink / raw) To: Andreas Ericsson; +Cc: git * Andreas Ericsson (ae@op5.se) wrote: > Keith Packard wrote: > >On Mon, 2006-07-03 at 20:40 -0700, Linus Torvalds wrote: > > > > > >> "And, it's painfully slow, even when the repository is up to date" > >> > >>and gave a 17-second time. > > > > > >It's faster this evening, down to 8 seconds using ssh and 4 seconds > >using git. I clearly need to force use of the git protocol. Anyone else > >like the attached patch? > > Since it changes the current meaning of ssh+git, I'm not exactly > thrilled. However, "git/ssh" or "ssh/git" would work fine for me. The > slash-separator could be used to say "fetch over this, push over that", > so we can end up with any valid protocol to use for fetches and another > one to push over. > If we would do such a thing, we would be probably better off allowing different URLs for pushing and pulling, because the git and ssh URLs will only be the same, if the git repositories are located in the root folder and I suspect that's almost never the case. Matthias ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues 2006-07-04 11:18 ` Matthias Kestenholz @ 2006-07-04 12:05 ` Andreas Ericsson 0 siblings, 0 replies; 30+ messages in thread From: Andreas Ericsson @ 2006-07-04 12:05 UTC (permalink / raw) To: Matthias Kestenholz; +Cc: git Matthias Kestenholz wrote: > * Andreas Ericsson (ae@op5.se) wrote: > >>Keith Packard wrote: >> >>>On Mon, 2006-07-03 at 20:40 -0700, Linus Torvalds wrote: >>> >>> >>> >>>> "And, it's painfully slow, even when the repository is up to date" >>>> >>>>and gave a 17-second time. >>> >>> >>>It's faster this evening, down to 8 seconds using ssh and 4 seconds >>>using git. I clearly need to force use of the git protocol. Anyone else >>>like the attached patch? >> >>Since it changes the current meaning of ssh+git, I'm not exactly >>thrilled. However, "git/ssh" or "ssh/git" would work fine for me. The >>slash-separator could be used to say "fetch over this, push over that", >>so we can end up with any valid protocol to use for fetches and another >>one to push over. >> > > > If we would do such a thing, we would be probably better off > allowing different URLs for pushing and pulling, because the git and > ssh URLs will only be the same, if the git repositories are located > in the root folder and I suspect that's almost never the case. > True. We use relative paths where I work, so for us either way would work. Your way is better though. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues 2006-07-04 3:21 ` Linus Torvalds 2006-07-04 3:30 ` Junio C Hamano @ 2006-07-04 4:02 ` Keith Packard 2006-07-04 4:19 ` Linus Torvalds 1 sibling, 1 reply; 30+ messages in thread From: Keith Packard @ 2006-07-04 4:02 UTC (permalink / raw) To: Linus Torvalds, Git Mailing List; +Cc: keithp [-- Attachment #1: Type: text/plain, Size: 732 bytes --] On Mon, 2006-07-03 at 20:21 -0700, Linus Torvalds wrote: > Keithp, can you try this trivial patch? It _should_ say something like Yeah, it says that only once. And, it runs the fetch-pack in about .5 seconds. And, now the whole process completes in 4.7 seconds; perhaps the remote server is less loaded than earlier this afternoon? It's also possible that I was running old git bits here, but I don't think so. > And then it should leave a "fetch.trace" file in your working directory, > which should show where that _one_ thing spends its time. It looks boring to me and spent 0.55 from start to finish. I can send along the whole trace if you have an acute desire to peer at it. -- keith.packard@intel.com [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues 2006-07-04 4:02 ` Keith Packard @ 2006-07-04 4:19 ` Linus Torvalds 2006-07-04 5:05 ` Keith Packard 2006-07-04 5:29 ` Keith Packard 0 siblings, 2 replies; 30+ messages in thread From: Linus Torvalds @ 2006-07-04 4:19 UTC (permalink / raw) To: Keith Packard; +Cc: Git Mailing List On Mon, 3 Jul 2006, Keith Packard wrote: > > Yeah, it says that only once. And, it runs the fetch-pack in about .5 > seconds. And, now the whole process completes in 4.7 seconds; perhaps > the remote server is less loaded than earlier this afternoon? Well, that's still strange. What takes 4.2 seconds then? > > And then it should leave a "fetch.trace" file in your working directory, > > which should show where that _one_ thing spends its time. > > It looks boring to me and spent 0.55 from start to finish. I can send > along the whole trace if you have an acute desire to peer at it. No, the 0.5 seconds is what I _expected_. There's something strange going on in your git fetch that it takes any longer than that. Can you instrument your "git-fetch.sh" script (just add random (echo $LINENO ; date) >&2 lines all over) to see what is so expensive? That fetch-pack really should be the most expensive part by far (and half a second sounds right), but it clearly isn't. At 4.7s, your fetch is still taking about ten times longer than it _should_. Linus ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues 2006-07-04 4:19 ` Linus Torvalds @ 2006-07-04 5:05 ` Keith Packard 2006-07-04 5:36 ` Linus Torvalds 2006-07-04 5:29 ` Keith Packard 1 sibling, 1 reply; 30+ messages in thread From: Keith Packard @ 2006-07-04 5:05 UTC (permalink / raw) To: Linus Torvalds; +Cc: keithp, Git Mailing List [-- Attachment #1: Type: text/plain, Size: 865 bytes --] On Mon, 2006-07-03 at 21:19 -0700, Linus Torvalds wrote: > Can you instrument your "git-fetch.sh" script (just add random > > (echo $LINENO ; date) >&2 > > lines all over) to see what is so expensive? 5 Start: 21:59:01.584648000 66 After args: 21:59:01.605987000 248 fetch_main() start: 21:59:02.408559000 339 fetch_main() before fetch-pack: 21:59:03.293228000 387 fetch_main() done: 21:59:04.784388000 422 After tag following: 21:59:05.311439000 438 All done: 21:59:05.315338000 fetch-pack itself took 0.421 seconds (measured with time(1)). Looks like the bulk of the time here is caused by simple shell processing overhead, some of which scales with the number of heads and tags to track. -- keith.packard@intel.com [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues 2006-07-04 5:05 ` Keith Packard @ 2006-07-04 5:36 ` Linus Torvalds 2006-07-04 6:21 ` Junio C Hamano 0 siblings, 1 reply; 30+ messages in thread From: Linus Torvalds @ 2006-07-04 5:36 UTC (permalink / raw) To: Keith Packard; +Cc: Git Mailing List On Mon, 3 Jul 2006, Keith Packard wrote: > > 5 Start: 21:59:01.584648000 > 66 After args: 21:59:01.605987000 > 248 fetch_main() start: 21:59:02.408559000 > 339 fetch_main() before fetch-pack: 21:59:03.293228000 > 387 fetch_main() done: 21:59:04.784388000 > 422 After tag following: 21:59:05.311439000 > 438 All done: 21:59:05.315338000 > > fetch-pack itself took 0.421 seconds (measured with time(1)). > > Looks like the bulk of the time here is caused by simple shell > processing overhead, some of which scales with the number of heads and > tags to track. Ahh.. Do you have tons of tags at the other end? Looking closer, I suspect a big part of it is that git-ls-remote $upload_pack --tags "$remote" | sed -ne 's|^\([0-9a-f]*\)[ ]\(refs/tags/.*\)^{}$|\1 \2|p' | while read sha1 name do .. done loop. With a lot of tags, the shell overhead there can indeed be pretty disgusting. And I was wrong - I thought it would do that git-ls-remote only if the first time around we noticed that we would need to, but we do actually do it all the time that we're fetching any new branches. The sad part is that we really already got the list once, we just never saved it away (ie "git-fetch-pack" actually _knows_ what the tags at the other end are, and also knows which tags we already have, so if we made git-fetch-pack just create that list and save it off, all the overhead would just go away). And yes, the shell script loops are really really simple, but some of them are actually quadratic in the number of refs (O(local*remote)). If this was a C program, we'd never even care, but with shell, the thing is slow enough that having even a modest amount of tags and refs is going to just make it waste a lot of time in shell scripting. We already do a lot of the infrastructure for "git fetch" in C - the remotes parsing etc is all things that "git fetch" used to share with "git push", but "git push" has been a builtin C program for a while now. I suspect we should just do the same to "git fetch", which would make all these issues just totally go away. Linus ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues 2006-07-04 5:36 ` Linus Torvalds @ 2006-07-04 6:21 ` Junio C Hamano 0 siblings, 0 replies; 30+ messages in thread From: Junio C Hamano @ 2006-07-04 6:21 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Linus Torvalds <torvalds@osdl.org> writes: > Looking closer, I suspect a big part of it is that > > git-ls-remote $upload_pack --tags "$remote" | > sed -ne 's|^\([0-9a-f]*\)[ ]\(refs/tags/.*\)^{}$|\1 \2|p' | > while read sha1 name > do > .. > done > > loop. Yes indeed. Maybe we can do this loop in Perl. Doing the whole thing in C is another option but it would be somewhat painful, unless we can deprecate all transport but git native protocols. On the other hand, 5 seconds may not matter that much in practice. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues 2006-07-04 4:19 ` Linus Torvalds 2006-07-04 5:05 ` Keith Packard @ 2006-07-04 5:29 ` Keith Packard 2006-07-04 5:53 ` Linus Torvalds 1 sibling, 1 reply; 30+ messages in thread From: Keith Packard @ 2006-07-04 5:29 UTC (permalink / raw) To: Linus Torvalds; +Cc: keithp, Git Mailing List [-- Attachment #1: Type: text/plain, Size: 837 bytes --] On Mon, 2006-07-03 at 21:19 -0700, Linus Torvalds wrote: > Well, that's still strange. What takes 4.2 seconds then? $ strace -e trace=execve -f git-fetch 2>&1 | grep execve | sed -e 's/^.*execve("//' -e 's/".*$//' | sort | uniq -c | sort -n 1 /bin/rm 1 /home/keithp/bin/git 1 /home/keithp/bin/git-fetch 1 /home/keithp/bin/git-fetch-pack 1 /home/keithp/bin/git-ls-remote 1 /home/keithp/bin/git-peek-remote 1 /usr/bin/sort 3 /bin/sed 4 /home/keithp/bin/git-repo-config 30 /bin/mkdir 30 /home/keithp/bin/git-cat-file 30 /home/keithp/bin/git-check-ref-format 30 /home/keithp/bin/git-merge-base 30 /usr/bin/dirname 64 /home/keithp/bin/git-rev-parse 361 /usr/bin/expr someone sure likes 'expr'... -- keith.packard@intel.com [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues 2006-07-04 5:29 ` Keith Packard @ 2006-07-04 5:53 ` Linus Torvalds 0 siblings, 0 replies; 30+ messages in thread From: Linus Torvalds @ 2006-07-04 5:53 UTC (permalink / raw) To: Keith Packard; +Cc: Git Mailing List On Mon, 3 Jul 2006, Keith Packard wrote: > > 361 /usr/bin/expr > > someone sure likes 'expr'... Heh. That's a very Junio thing to do. Junio seems to like if expr "z$string" : "z<regexp>" >/dev/null then .. and I think he explained it as being the way old-fashioned users do it. Linus ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues 2006-07-03 18:02 git-fetch per-repository speed issues Keith Packard 2006-07-03 23:14 ` Linus Torvalds @ 2006-07-04 15:42 ` Jakub Narebski 2006-07-04 16:30 ` Thomas Glanzmann 2006-07-04 17:45 ` Junio C Hamano 2006-07-06 23:36 ` David Woodhouse 2 siblings, 2 replies; 30+ messages in thread From: Jakub Narebski @ 2006-07-04 15:42 UTC (permalink / raw) To: git I wonder if the problem detected here is also responsible with results of Jeremy Blosser benchmark comparing git with Mercurial http://lists.ibiblio.org/pipermail/sm-discuss/2006-May/014586.html where git wins for clone, status and log, but is slower for pull. See summary at http://git.or.cz/gitwiki/GitBenchmarks#head-85df1bb7f019c4c504e34cde43450ef69349882f -- Jakub Narebski ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues 2006-07-04 15:42 ` Jakub Narebski @ 2006-07-04 16:30 ` Thomas Glanzmann 2006-07-04 17:45 ` Junio C Hamano 1 sibling, 0 replies; 30+ messages in thread From: Thomas Glanzmann @ 2006-07-04 16:30 UTC (permalink / raw) To: Jakub Narebski; +Cc: git Hello, > See summary at > http://git.or.cz/gitwiki/GitBenchmarks#head-85df1bb7f019c4c504e34cde43450ef69349882f thank you for clarifing! I finally understand why Solaris folks prefer hg over git: It is dog slow. - So it fits the general philosophy behind Solaris. Thomas ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues 2006-07-04 15:42 ` Jakub Narebski 2006-07-04 16:30 ` Thomas Glanzmann @ 2006-07-04 17:45 ` Junio C Hamano 2006-07-04 19:22 ` Linus Torvalds 1 sibling, 1 reply; 30+ messages in thread From: Junio C Hamano @ 2006-07-04 17:45 UTC (permalink / raw) To: git; +Cc: jnareb Jakub Narebski <jnareb@gmail.com> writes: > I wonder if the problem detected here is also responsible with results > of Jeremy Blosser benchmark comparing git with Mercurial > http://lists.ibiblio.org/pipermail/sm-discuss/2006-May/014586.html > where git wins for clone, status and log, but is slower for pull. I had an impression, though the report does not talk about this specific detail, that the extra time we are paying is because the "git pull" test is done without suppressing the final diffstat phase. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues 2006-07-04 17:45 ` Junio C Hamano @ 2006-07-04 19:22 ` Linus Torvalds 2006-07-04 21:05 ` Junio C Hamano 0 siblings, 1 reply; 30+ messages in thread From: Linus Torvalds @ 2006-07-04 19:22 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, jnareb On Tue, 4 Jul 2006, Junio C Hamano wrote: > > I had an impression, though the report does not talk about this > specific detail, that the extra time we are paying is because > the "git pull" test is done without suppressing the final > diffstat phase. I'm pretty sure that was the reason for the particular hg issue. Looking at the "clone" times, the problem is almost certainly not the actual pulling. The diffstat generation is often the largest part of a git merge. It's gotten cheaper since the hg benchmarks were done (I think they were done back before the integrated diff generation, so they also have the overhead of executing a lot of external GNU diff processes), but it's still not "cheap". But I have to say that the diffstat at least for me is absolutely invaluable. Linus ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues 2006-07-04 19:22 ` Linus Torvalds @ 2006-07-04 21:05 ` Junio C Hamano 0 siblings, 0 replies; 30+ messages in thread From: Junio C Hamano @ 2006-07-04 21:05 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Linus Torvalds <torvalds@osdl.org> writes: > But I have to say that the diffstat at least for me is absolutely > invaluable. Oh, I absolutely agree with that and somebody who suggests to turn it off by default needs a very good argument to convince me. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: git-fetch per-repository speed issues 2006-07-03 18:02 git-fetch per-repository speed issues Keith Packard 2006-07-03 23:14 ` Linus Torvalds 2006-07-04 15:42 ` Jakub Narebski @ 2006-07-06 23:36 ` David Woodhouse 2 siblings, 0 replies; 30+ messages in thread From: David Woodhouse @ 2006-07-06 23:36 UTC (permalink / raw) To: Keith Packard; +Cc: Git Mailing List On Mon, 2006-07-03 at 11:02 -0700, Keith Packard wrote: > just uses ssh for everything. This slows down the connection process > by several seconds. Only if you forgot to use the 'control socket' support, which lets you make a _single_ authenticated connection and re-use it for multiple sessions. http://david.woodhou.se/openssh-control.html has a couple of improvements, but the basics are usable in upstream openssh. -- dwmw2 ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2006-07-06 23:36 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-03 18:02 git-fetch per-repository speed issues Keith Packard
2006-07-03 23:14 ` Linus Torvalds
2006-07-04 0:21 ` Jeff King
2006-07-04 1:22 ` Ryan Anderson
2006-07-04 1:44 ` Jeff King
2006-07-04 1:55 ` Ryan Anderson
2006-07-04 3:07 ` Linus Torvalds
2006-07-05 6:47 ` Jeff King
2006-07-05 16:40 ` Linus Torvalds
2006-07-04 6:44 ` Jakub Narebski
[not found] ` <1151973438.4723.70.camel@neko.keithp.com>
2006-07-04 3:21 ` Linus Torvalds
2006-07-04 3:30 ` Junio C Hamano
2006-07-04 3:40 ` Linus Torvalds
2006-07-04 4:30 ` Keith Packard
2006-07-04 11:10 ` Andreas Ericsson
2006-07-04 11:18 ` Matthias Kestenholz
2006-07-04 12:05 ` Andreas Ericsson
2006-07-04 4:02 ` Keith Packard
2006-07-04 4:19 ` Linus Torvalds
2006-07-04 5:05 ` Keith Packard
2006-07-04 5:36 ` Linus Torvalds
2006-07-04 6:21 ` Junio C Hamano
2006-07-04 5:29 ` Keith Packard
2006-07-04 5:53 ` Linus Torvalds
2006-07-04 15:42 ` Jakub Narebski
2006-07-04 16:30 ` Thomas Glanzmann
2006-07-04 17:45 ` Junio C Hamano
2006-07-04 19:22 ` Linus Torvalds
2006-07-04 21:05 ` Junio C Hamano
2006-07-06 23:36 ` David Woodhouse
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).