* git branch performance problem? @ 2007-10-10 20:22 Han-Wen Nienhuys 2007-10-10 20:44 ` Lars Hjemli 0 siblings, 1 reply; 26+ messages in thread From: Han-Wen Nienhuys @ 2007-10-10 20:22 UTC (permalink / raw) To: git Hello, I'm seeing very slow performance with 'git-branch'. Is this the canonical way to find out the current branch? ( I know I can look into .git/HEAD, but how likely is that to break in the future?) hanwen@lilypond:/tmp/z$ time git branch * foo master real 0m0.307s user 0m0.232s sys 0m0.038s hanwen@lilypond:/tmp/z$ git --version git version 1.5.1.rc1.949.g322bc On NFS this takes 5 seconds. Note that I have a humongous amount of remotes, but those should not be examined without -r, right? hanwen@lilypond:/tmp/z$ find .git/refs/remotes | wc -l 1856 -- Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git branch performance problem? 2007-10-10 20:22 git branch performance problem? Han-Wen Nienhuys @ 2007-10-10 20:44 ` Lars Hjemli 2007-10-10 21:17 ` Han-Wen Nienhuys 0 siblings, 1 reply; 26+ messages in thread From: Lars Hjemli @ 2007-10-10 20:44 UTC (permalink / raw) To: hanwen; +Cc: git On 10/10/07, Han-Wen Nienhuys <hanwenn@gmail.com> wrote: > I'm seeing very slow performance with 'git-branch'. Is this the > canonical way to find out the current branch? You could also try 'git symbolic-ref HEAD', but see below... > hanwen@lilypond:/tmp/z$ find .git/refs/remotes | wc -l > 1856 You probably want to run 'git gc' (which will run 'git pack-refs', i.e. put all files currently under .git/refs into a single file). This should speed up 'git branch' (and quite possibly other commands too). -- larsh ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git branch performance problem? 2007-10-10 20:44 ` Lars Hjemli @ 2007-10-10 21:17 ` Han-Wen Nienhuys 2007-10-10 21:24 ` Han-Wen Nienhuys 0 siblings, 1 reply; 26+ messages in thread From: Han-Wen Nienhuys @ 2007-10-10 21:17 UTC (permalink / raw) To: Lars Hjemli; +Cc: git 2007/10/10, Lars Hjemli <hjemli@gmail.com>: > On 10/10/07, Han-Wen Nienhuys <hanwenn@gmail.com> wrote: > > I'm seeing very slow performance with 'git-branch'. Is this the > > canonical way to find out the current branch? > > You could also try 'git symbolic-ref HEAD', but see below... > > > hanwen@lilypond:/tmp/z$ find .git/refs/remotes | wc -l > > 1856 > > You probably want to run 'git gc' (which will run 'git pack-refs', > i.e. put all files currently under .git/refs into a single file). This > should speed up 'git branch' (and quite possibly other commands too). This seems rather unuseful. After running gc pack-refs --all, I lost my HEAD, hanwen@lilypond:~/vc/git5$ git show HEAD fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree. Use '--' to separate paths from revisions Is there a way to only pack refs under a certain subdirectory of .git/refs ? (I'm thinking of .git/refs/remotes ) -- Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git branch performance problem? 2007-10-10 21:17 ` Han-Wen Nienhuys @ 2007-10-10 21:24 ` Han-Wen Nienhuys 2007-10-10 21:30 ` Han-Wen Nienhuys ` (2 more replies) 0 siblings, 3 replies; 26+ messages in thread From: Han-Wen Nienhuys @ 2007-10-10 21:24 UTC (permalink / raw) To: Lars Hjemli; +Cc: git 2007/10/10, Han-Wen Nienhuys <hanwenn@gmail.com>: > > You probably want to run 'git gc' (which will run 'git pack-refs', > > i.e. put all files currently under .git/refs into a single file). This > > should speed up 'git branch' (and quite possibly other commands too). > > This seems rather unuseful. After running gc pack-refs --all, I lost my HEAD, > > hanwen@lilypond:~/vc/git5$ git show HEAD > fatal: ambiguous argument 'HEAD': unknown revision or path not in the > working tree. More to the point, I seemed to have lost my entire repository. This is the type of surprise I don't enjoy. Now, can someone explain why 'git branch' takes forever if there are only two non-remote branches ? -- Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git branch performance problem? 2007-10-10 21:24 ` Han-Wen Nienhuys @ 2007-10-10 21:30 ` Han-Wen Nienhuys 2007-10-10 21:39 ` J. Bruce Fields 2007-10-10 23:39 ` Linus Torvalds 2007-10-10 21:34 ` Lars Hjemli 2007-10-10 21:54 ` [PATCH] git-branch: only traverse the requested refs Lars Hjemli 2 siblings, 2 replies; 26+ messages in thread From: Han-Wen Nienhuys @ 2007-10-10 21:30 UTC (permalink / raw) To: Lars Hjemli; +Cc: git 2007/10/10, Han-Wen Nienhuys <hanwenn@gmail.com>: > More to the point, I seemed to have lost my entire repository. This is > the type of surprise I don't enjoy. > > Now, can someone explain why 'git branch' takes forever if there are > only two non-remote branches ? So, Here is a question: I would like to share commitishes between two checkouts of a repository. The reason for this is that I want to easily cherry pick back and forth between the two. The files of in one of them should be continually available, since I am running out of that directory. The way I solved that, was to have both repositories pointing to each other, using alternates. Now, after a couple of gc and pack-refs iterations, I am greeted by hanwen@lilypond:~/vc/git6$ git fsck missing tree 12b00ec3190f7b46a5fe0a3235445bead4c9645b broken link from tree 1718d09e0394d113c162e4a3471e7a1f20914a94 to blob 635e2802568b85017007698c0e6dd4d28dca496f broken link from tree 926899798fce75038e24f8fa1838f6da8bcf105f to tree f1b852d270ebbaaf95d8ddc06c52763bad11ff25 missing blob 99f0c0d63276fce444e3a200167b636236784c52 missing tree f1b852d270ebbaaf95d8ddc06c52763bad11ff25 missing blob 236962a87fafae8ca2dce2dc550d344aa7a8884a missing blob 7d69ca297f392a954c4cdcb62bb4c8a90ddb862b missing blob 9e39be8f5cb4eeff97fcfd6eb77fefeda02f0e71 dangling blob f3a93f023080ce9fc6becb397e366cc4ceb192f5 could it be that GC does not handle cyclic alternates correctly? -- Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git branch performance problem? 2007-10-10 21:30 ` Han-Wen Nienhuys @ 2007-10-10 21:39 ` J. Bruce Fields 2007-10-10 21:45 ` Lars Hjemli 2007-10-10 23:39 ` Linus Torvalds 1 sibling, 1 reply; 26+ messages in thread From: J. Bruce Fields @ 2007-10-10 21:39 UTC (permalink / raw) To: hanwen; +Cc: Lars Hjemli, git On Wed, Oct 10, 2007 at 06:30:02PM -0300, Han-Wen Nienhuys wrote: > 2007/10/10, Han-Wen Nienhuys <hanwenn@gmail.com>: > > More to the point, I seemed to have lost my entire repository. This is > > the type of surprise I don't enjoy. > > > > Now, can someone explain why 'git branch' takes forever if there are > > only two non-remote branches ? > > So, > > Here is a question: I would like to share commitishes between two checkouts > of a repository. The reason for this is that I want to easily cherry > pick back and forth between the two. The files of in one of them > should be continually available, since I am running out of that > directory. > > The way I solved that, was to have both repositories pointing to each > other, using alternates. > > Now, after a couple of gc and pack-refs iterations, I am greeted by > > hanwen@lilypond:~/vc/git6$ git fsck > missing tree 12b00ec3190f7b46a5fe0a3235445bead4c9645b > broken link from tree 1718d09e0394d113c162e4a3471e7a1f20914a94 > to blob 635e2802568b85017007698c0e6dd4d28dca496f > broken link from tree 926899798fce75038e24f8fa1838f6da8bcf105f > to tree f1b852d270ebbaaf95d8ddc06c52763bad11ff25 > missing blob 99f0c0d63276fce444e3a200167b636236784c52 > missing tree f1b852d270ebbaaf95d8ddc06c52763bad11ff25 > missing blob 236962a87fafae8ca2dce2dc550d344aa7a8884a > missing blob 7d69ca297f392a954c4cdcb62bb4c8a90ddb862b > missing blob 9e39be8f5cb4eeff97fcfd6eb77fefeda02f0e71 > dangling blob f3a93f023080ce9fc6becb397e366cc4ceb192f5 > > > could it be that GC does not handle cyclic alternates correctly? Does it handle alternates at all? If you run git-gc on a repository which other repositories get objects from, then my impression was that bad things happen. --b. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git branch performance problem? 2007-10-10 21:39 ` J. Bruce Fields @ 2007-10-10 21:45 ` Lars Hjemli 2007-10-10 21:49 ` Han-Wen Nienhuys 2007-10-10 22:55 ` Spam: " Brandon Casey 0 siblings, 2 replies; 26+ messages in thread From: Lars Hjemli @ 2007-10-10 21:45 UTC (permalink / raw) To: J. Bruce Fields; +Cc: hanwen, git On 10/10/07, J. Bruce Fields <bfields@fieldses.org> wrote: > On Wed, Oct 10, 2007 at 06:30:02PM -0300, Han-Wen Nienhuys wrote: > > could it be that GC does not handle cyclic alternates correctly? > > Does it handle alternates at all? If you run git-gc on a repository > which other repositories get objects from, then my impression was that > bad things happen. > AFAIK 'git gc' is safe, while 'git gc --prune' will remove loose (unreferenced) objects. -- larsh ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git branch performance problem? 2007-10-10 21:45 ` Lars Hjemli @ 2007-10-10 21:49 ` Han-Wen Nienhuys 2007-10-10 21:53 ` J. Bruce Fields 2007-10-10 21:53 ` Johannes Schindelin 2007-10-10 22:55 ` Spam: " Brandon Casey 1 sibling, 2 replies; 26+ messages in thread From: Han-Wen Nienhuys @ 2007-10-10 21:49 UTC (permalink / raw) To: Lars Hjemli; +Cc: J. Bruce Fields, git 2007/10/10, Lars Hjemli <hjemli@gmail.com>: > On 10/10/07, J. Bruce Fields <bfields@fieldses.org> wrote: > > On Wed, Oct 10, 2007 at 06:30:02PM -0300, Han-Wen Nienhuys wrote: > > > could it be that GC does not handle cyclic alternates correctly? > > > > Does it handle alternates at all? If you run git-gc on a repository > > which other repositories get objects from, then my impression was that > > bad things happen. > > > > AFAIK 'git gc' is safe, while 'git gc --prune' will remove loose > (unreferenced) objects. Yes, I think that in this case, gc --prune was run accidentally, but given that the history of the program invoking git just died, I'm not sure how to figure that out. Maybe gc --prune could follow the alternates and abort if a cycle was detected? -- Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git branch performance problem? 2007-10-10 21:49 ` Han-Wen Nienhuys @ 2007-10-10 21:53 ` J. Bruce Fields 2007-10-10 22:01 ` Han-Wen Nienhuys 2007-10-10 21:53 ` Johannes Schindelin 1 sibling, 1 reply; 26+ messages in thread From: J. Bruce Fields @ 2007-10-10 21:53 UTC (permalink / raw) To: hanwen; +Cc: Lars Hjemli, git On Wed, Oct 10, 2007 at 06:49:19PM -0300, Han-Wen Nienhuys wrote: > 2007/10/10, Lars Hjemli <hjemli@gmail.com>: > > On 10/10/07, J. Bruce Fields <bfields@fieldses.org> wrote: > > > On Wed, Oct 10, 2007 at 06:30:02PM -0300, Han-Wen Nienhuys wrote: > > > > could it be that GC does not handle cyclic alternates correctly? > > > > > > Does it handle alternates at all? If you run git-gc on a repository > > > which other repositories get objects from, then my impression was that > > > bad things happen. > > > > > > > AFAIK 'git gc' is safe, while 'git gc --prune' will remove loose > > (unreferenced) objects. > > Yes, I think that in this case, gc --prune was run accidentally, but > given that the history of the program invoking git just died, I'm not > sure how to figure that out. > > Maybe gc --prune could follow the alternates and abort if a cycle was detected? Don't the alternates point in the wrong direction? You'd need pointers back from the main repository to the repositories that depend on it for objects. Which would be nice.... --b. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git branch performance problem? 2007-10-10 21:53 ` J. Bruce Fields @ 2007-10-10 22:01 ` Han-Wen Nienhuys 0 siblings, 0 replies; 26+ messages in thread From: Han-Wen Nienhuys @ 2007-10-10 22:01 UTC (permalink / raw) To: J. Bruce Fields; +Cc: Lars Hjemli, git 2007/10/10, J. Bruce Fields <bfields@fieldses.org>: > > Maybe gc --prune could follow the alternates and abort if a cycle was detected? > > Don't the alternates point in the wrong direction? You'd need pointers > back from the main repository to the repositories that depend on it for > objects. > > Which would be nice.... The development repo was cloned from the main repo; then sometimes I cherry pick from development into the main repo. Hence alternates in 2 directions. -- Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git branch performance problem? 2007-10-10 21:49 ` Han-Wen Nienhuys 2007-10-10 21:53 ` J. Bruce Fields @ 2007-10-10 21:53 ` Johannes Schindelin 1 sibling, 0 replies; 26+ messages in thread From: Johannes Schindelin @ 2007-10-10 21:53 UTC (permalink / raw) To: hanwen; +Cc: Lars Hjemli, J. Bruce Fields, git Hi, On Wed, 10 Oct 2007, Han-Wen Nienhuys wrote: > 2007/10/10, Lars Hjemli <hjemli@gmail.com>: > > On 10/10/07, J. Bruce Fields <bfields@fieldses.org> wrote: > > > On Wed, Oct 10, 2007 at 06:30:02PM -0300, Han-Wen Nienhuys wrote: > > > > could it be that GC does not handle cyclic alternates correctly? > > > > > > Does it handle alternates at all? If you run git-gc on a repository > > > which other repositories get objects from, then my impression was > > > that bad things happen. > > > > > > > AFAIK 'git gc' is safe, while 'git gc --prune' will remove loose > > (unreferenced) objects. > > Yes, I think that in this case, gc --prune was run accidentally, but > given that the history of the program invoking git just died, I'm not > sure how to figure that out. > > Maybe gc --prune could follow the alternates and abort if a cycle was > detected? I think we talked about this quite some time ago, and the resolution was that it is too hard. Now that it bit somebody in real life, I think we have to try harder. And probably the best place to check would be git-prune, not git-gc, since that is the program (called by gc) that most probably killed your repo. Come to think of it, it should probably be part of git-repack, too. Will try to cobble up a patch, Dscho ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Spam: Re: git branch performance problem? 2007-10-10 21:45 ` Lars Hjemli 2007-10-10 21:49 ` Han-Wen Nienhuys @ 2007-10-10 22:55 ` Brandon Casey 2007-10-11 9:41 ` Mike Ralphson 1 sibling, 1 reply; 26+ messages in thread From: Brandon Casey @ 2007-10-10 22:55 UTC (permalink / raw) To: Lars Hjemli; +Cc: J. Bruce Fields, hanwen, git Lars Hjemli wrote: > On 10/10/07, J. Bruce Fields <bfields@fieldses.org> wrote: >> On Wed, Oct 10, 2007 at 06:30:02PM -0300, Han-Wen Nienhuys wrote: >>> could it be that GC does not handle cyclic alternates correctly? >> Does it handle alternates at all? If you run git-gc on a repository >> which other repositories get objects from, then my impression was that >> bad things happen. >> > > AFAIK 'git gc' is safe, while 'git gc --prune' will remove loose > (unreferenced) objects. No, this is not the case, unless something has changed very recently in git-gc or git-repack. Even git-gc with no arguments is unsafe if the repository being gc'ed is listed in another's alternates. git-gc calls repack with -a and -d. which causes a new pack to be created which only contains the objects required by the local repository. The other packs are then deleted. Objects contained in those packs and required by a "sharing" repository (one using the alternates mechanism) will be deleted if the local repository no longer references them. Maybe git-gc should make use of repack's new -A option by default and only use -a (and not -A) when --prune is specified... -brandon ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Spam: Re: git branch performance problem? 2007-10-10 22:55 ` Spam: " Brandon Casey @ 2007-10-11 9:41 ` Mike Ralphson 2007-10-11 10:58 ` Johannes Schindelin 0 siblings, 1 reply; 26+ messages in thread From: Mike Ralphson @ 2007-10-11 9:41 UTC (permalink / raw) To: Brandon Casey; +Cc: Lars Hjemli, J. Bruce Fields, hanwen, git On 10/10/07, Brandon Casey <casey@nrlssc.navy.mil> wrote: > No, this is not the case, unless something has changed very recently > in git-gc or git-repack. Even git-gc with no arguments is unsafe if > the repository being gc'ed is listed in another's alternates. > > git-gc calls repack with -a and -d. which causes a new pack to be > created which only contains the objects required by the local repository. > The other packs are then deleted. Objects contained in those packs and > required by a "sharing" repository (one using the alternates mechanism) > will be deleted if the local repository no longer references them. It's not something I've really looked into, but there seems to be a reflogs mechanism which can temporarily pin an otherwise unreferenced object so it doesn't get deleted. Would it be possible to populate the remote's view of referenced objects into this, at the point of clone, push or pull, which would seem to be the points at which this might be changing. Obviously this is of no use if you're 'anonymously' poncing off a third repo to save clone time, but if you're in control of both repo's it might be useful. Mike ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Spam: Re: git branch performance problem? 2007-10-11 9:41 ` Mike Ralphson @ 2007-10-11 10:58 ` Johannes Schindelin 0 siblings, 0 replies; 26+ messages in thread From: Johannes Schindelin @ 2007-10-11 10:58 UTC (permalink / raw) To: Mike Ralphson; +Cc: Brandon Casey, Lars Hjemli, J. Bruce Fields, hanwen, git Hi, On Thu, 11 Oct 2007, Mike Ralphson wrote: > It's not something I've really looked into, but there seems to be a > reflogs mechanism which can temporarily pin an otherwise unreferenced > object so it doesn't get deleted. Would it be possible to populate the > remote's view of referenced objects into this, at the point of clone, > push or pull, which would seem to be the points at which this might be > changing. > > Obviously this is of no use if you're 'anonymously' poncing off a > third repo to save clone time, but if you're in control of both repo's > it might be useful. I cannot really allege that I understood what you were trying to say, but I guess you want to use clone to get rid of objects you just threw out by either filter-branch or deleting a branch. The answer is that the file:// as well as the git:// protocol will do that. For local clones, they are not the default, since they are slower than hardlinking. Hth, Dscho ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git branch performance problem? 2007-10-10 21:30 ` Han-Wen Nienhuys 2007-10-10 21:39 ` J. Bruce Fields @ 2007-10-10 23:39 ` Linus Torvalds 2007-10-11 2:26 ` Han-Wen Nienhuys 1 sibling, 1 reply; 26+ messages in thread From: Linus Torvalds @ 2007-10-10 23:39 UTC (permalink / raw) To: hanwen; +Cc: Lars Hjemli, git On Wed, 10 Oct 2007, Han-Wen Nienhuys wrote: > > The way I solved that, was to have both repositories pointing to each > other, using alternates. Ouch. Double un-good. Not a good idea. Especially not if you do development in both and pull and push between them. What will happen is that if you do alternates pointing both ways, you basically end up having a "shared pool of objects". So it's pretty much equivalent to just using a shared object directory, and it has *exactly* the same issues with object reachability and references: you have a shared pool of objects, but you only ever see *one* set of references, so garbage collection cannot work - because it will always see just a subset of the real references, while it sees essentially all objects. > could it be that GC does not handle cyclic alternates correctly? It's not about cyclic per se: it's about the fact that GC will do garbage collection based on reachability with the local references. Which is normally fine. It's normally fine, because the object tree is "local" too. But when doing alternates: - the tree that is being used as an alternate *has* to be totally stable. It must *never* have been re-based, or have any GC'able objects in the first place. IOW, doing a "git gc" on it will be safe, because there is no way any objects that the other alternate depends on could be pruned. - You definitely must *not* do a two-way alternate, because that violates another rule: the rule that the "alternate base" (which is now *both* of the repositories) is self-sufficient. Since they both point to each other, there's no way to know whether they are self-sufficient or not: they may be re-using each others objects *and* packs! And in the above, the "*and* packs" is important, and probably the cause of your problems. Because "git repack -a -d -l" (which is what "git gc" does) will always gather up any loose objects even from remote sites, but the "-l" means that it will not do so for alternate packed objects. So what happens is that if one of the repositories can reach some object that is in a pack in the other repository, "git gc" will still *leave* it dependent on a pack in the other repository. But maybe that object isn't even reachable in the other repo any more (for whatever reason - a rebase, whatever), then when you repack the other repository, now all the packs will be replaced by one new pack - and the one new pack will only contain the objects reachable from the other repo. IOW: alternates are dangerous. A shared object directory is dangerous. You should basically only do it under very controlled circumstances, and otherwise you should use either hardlinks or if you want added safety, totally separate repositories. Basically, here's an example of badness, with A and B being repos that point to each other. - do something in A - pull it into B - this leaves the objects in A, because of the alternates link. - rebase A - "git gc" in A: this removes unreachable objects from A, and now B is screwed. So the rule really is: never *ever* do anything but fast-forward in a repo that is an alternate for another one. If you do a circular link, I think it's still safe if you follow that rule, but now obviously the rule holds for *both* repos (and quite frankly, I'd worry so much that I'd never do it even then). There should be another rule too: git on its own is not a backup system. You can use git *as* a backup system, but you need to do so by mirroring the whole repository, and not on the same disk. (ie, for me, git *is* a backup system, but that's only because I push my repos to other sites - a single git repo on its own has zero redundancy) Linus ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git branch performance problem? 2007-10-10 23:39 ` Linus Torvalds @ 2007-10-11 2:26 ` Han-Wen Nienhuys 2007-10-11 6:41 ` Alex Riesen ` (3 more replies) 0 siblings, 4 replies; 26+ messages in thread From: Han-Wen Nienhuys @ 2007-10-11 2:26 UTC (permalink / raw) To: Linus Torvalds; +Cc: Lars Hjemli, git 2007/10/10, Linus Torvalds <torvalds@linux-foundation.org>: > IOW: alternates are dangerous. A shared object directory is dangerous. You > should basically only do it under very controlled circumstances, and > otherwise you should use either hardlinks or if you want added safety, > totally separate repositories. I recall reading a few months ago that it was "clone -l" that gave you the jeebies, rather than "clone -s". > So the rule really is: never *ever* do anything but fast-forward in a repo >[..] Methinks this is all too difficult. I will use clone -l henceforth. Is there any reason to prefer -s over -l? Given your lengthy exposition on the dangers of alternates, I would say this is a features that deserves to be buried or at least deemphasized in the documentation. For cherrypicking convenience, I would still appreciate it if there was a mechanism similar to alternates that would allow me to view objects from an alternate repo; objects found through this mechanism should never be assumed to be present in the database, of course. -- Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git branch performance problem? 2007-10-11 2:26 ` Han-Wen Nienhuys @ 2007-10-11 6:41 ` Alex Riesen 2007-10-11 10:46 ` Johannes Schindelin ` (2 subsequent siblings) 3 siblings, 0 replies; 26+ messages in thread From: Alex Riesen @ 2007-10-11 6:41 UTC (permalink / raw) To: hanwen; +Cc: Linus Torvalds, Lars Hjemli, git Han-Wen Nienhuys, Thu, Oct 11, 2007 04:26:24 +0200: > > So the rule really is: never *ever* do anything but fast-forward in a repo > >[..] > > Methinks this is all too difficult. I will use clone -l henceforth. It is current default for local clones ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git branch performance problem? 2007-10-11 2:26 ` Han-Wen Nienhuys 2007-10-11 6:41 ` Alex Riesen @ 2007-10-11 10:46 ` Johannes Schindelin 2007-10-11 13:11 ` Han-Wen Nienhuys 2007-10-11 15:16 ` Linus Torvalds 2007-10-12 17:19 ` Salikh Zakirov 3 siblings, 1 reply; 26+ messages in thread From: Johannes Schindelin @ 2007-10-11 10:46 UTC (permalink / raw) To: hanwen; +Cc: Linus Torvalds, Lars Hjemli, git Hi, On Wed, 10 Oct 2007, Han-Wen Nienhuys wrote: > For cherrypicking convenience, I would still appreciate it if there was > a mechanism similar to alternates that would allow me to view objects > from an alternate repo; objects found through this mechanism should > never be assumed to be present in the database, of course. Silly question: why don't you just git remote add -f other <url> and then review the changes with "git log", "git diff" and "git show"? Ciao, Dscho ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git branch performance problem? 2007-10-11 10:46 ` Johannes Schindelin @ 2007-10-11 13:11 ` Han-Wen Nienhuys 0 siblings, 0 replies; 26+ messages in thread From: Han-Wen Nienhuys @ 2007-10-11 13:11 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Linus Torvalds, Lars Hjemli, git 2007/10/11, Johannes Schindelin <Johannes.Schindelin@gmx.de>: > > For cherrypicking convenience, I would still appreciate it if there was > > a mechanism similar to alternates that would allow me to view objects > > from an alternate repo; objects found through this mechanism should > > never be assumed to be present in the database, of course. > > Silly question: why don't you just > > git remote add -f other <url> > > and then review the changes with "git log", "git diff" and "git show"? Thank for the tip; I'll look into it. -- Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git branch performance problem? 2007-10-11 2:26 ` Han-Wen Nienhuys 2007-10-11 6:41 ` Alex Riesen 2007-10-11 10:46 ` Johannes Schindelin @ 2007-10-11 15:16 ` Linus Torvalds 2007-10-12 17:19 ` Salikh Zakirov 3 siblings, 0 replies; 26+ messages in thread From: Linus Torvalds @ 2007-10-11 15:16 UTC (permalink / raw) To: hanwen; +Cc: Lars Hjemli, git On Wed, 10 Oct 2007, Han-Wen Nienhuys wrote: > > I recall reading a few months ago that it was "clone -l" that gave you > the jeebies, rather than "clone -s". Yes, "clone -l" gives me the jeebies, because I'm a totally anal person when it comes to disk corruption and a worry-wart. I've just had it happen too many times (usually because a disk simply goes bad), and "git clone -l" basically means that if one repository gets corrupted, then so does the other one. But clone -s gives me even *more* jeebies, although I think it's in some respect also more useful. The alternates thing is really useful for servers in particular, where you basically want to have multiple "branches" maintained by lots of people, but all based on some expected base version. So if you think of alternates as a "kernel.org" or "repo.or.cz" thing, where you might have a hundred different repositories all based on the same "standard" version, then I think you basically have the right model. In that situation, "git clone -l" doesn't work that well, since the repositories just start out sharing data, but don't do it long term. So "git clone -l" (which is the default now - my jeebies really are my personal psychological problem) is really useful for latency reasons for a local clone, and has basically no real downsides. It's not useful for *backups*, but it's useful for development. > > So the rule really is: never *ever* do anything but fast-forward in a repo > >[..] > > Methinks this is all too difficult. I will use clone -l henceforth. Is > there any reason to prefer -s over -l? Good. And no, for actual *development* there is no reason to prefer -s over -l (and as mentioned, '-l' is the default in modern versions). For a git *server* setup, -s is better, since it's more long-term. But in that situation, it also requires that the server maintainer have some rules (ie only use "-s" for stable base trees and/or use extra care when repacking the base). > Given your lengthy exposition on the dangers of alternates, I would say > this is a features that deserves to be buried or at least deemphasized > in the documentation. I do agree. We should make the dangers very clear. > For cherrypicking convenience, I would still appreciate it if there > was a mechanism similar to alternates that would allow me to view > objects from an alternate repo; objects found through this mechanism > should never be assumed to be present in the database, of course. Well, the way that really should work is that you "git fetch remote" and work on the end result in a "remote branch". That *will* make the objects present in the database, but not in your actual branches (until you cherry-pick), but there really are no real downsides. If the remote is truly related to your local tree, it all delta's so well that the disk space issues should basically be none. Linus ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git branch performance problem? 2007-10-11 2:26 ` Han-Wen Nienhuys ` (2 preceding siblings ...) 2007-10-11 15:16 ` Linus Torvalds @ 2007-10-12 17:19 ` Salikh Zakirov 3 siblings, 0 replies; 26+ messages in thread From: Salikh Zakirov @ 2007-10-12 17:19 UTC (permalink / raw) To: git; +Cc: Linus Torvalds, Lars Hjemli, git Han-Wen Nienhuys wrote: > For cherrypicking convenience, I would still appreciate it if there > was a mechanism similar to alternates that would allow me to view > objects from an alternate repo; objects found through this mechanism > should never be assumed to be present in the database, of course. There exist a script contrib/workdir/git-new-workdir, which creates a new working copy that literally shares the same object store. It will share both object store and branches, so some care must be taken: branch which checkout out in one shared working directory must never be updated (committed or pulled into) from the other shared working directory. Said that, I personally find this trick very useful for browsing alternate branch code and quick bug fixing. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git branch performance problem? 2007-10-10 21:24 ` Han-Wen Nienhuys 2007-10-10 21:30 ` Han-Wen Nienhuys @ 2007-10-10 21:34 ` Lars Hjemli 2007-10-10 21:54 ` [PATCH] git-branch: only traverse the requested refs Lars Hjemli 2 siblings, 0 replies; 26+ messages in thread From: Lars Hjemli @ 2007-10-10 21:34 UTC (permalink / raw) To: hanwen; +Cc: git On 10/10/07, Han-Wen Nienhuys <hanwenn@gmail.com> wrote: > 2007/10/10, Han-Wen Nienhuys <hanwenn@gmail.com>: > > > You probably want to run 'git gc' (which will run 'git pack-refs', > > > i.e. put all files currently under .git/refs into a single file). This > > > should speed up 'git branch' (and quite possibly other commands too). > > > > This seems rather unuseful. After running gc pack-refs --all, I lost my HEAD, > > > > hanwen@lilypond:~/vc/git5$ git show HEAD > > fatal: ambiguous argument 'HEAD': unknown revision or path not in the > > working tree. > > More to the point, I seemed to have lost my entire repository. This is > the type of surprise I don't enjoy. Yeah, this is bad, I'm sorry to have caused you trouble. But I fail to see how 'git pack-refs --all' could possibly trash your repository. A few questions: What version of git are you using? What's the output from these commands: $ cat .git/packed-refs $ cat .git/HEAD $ find .git/refs -type f | wc -l > Now, can someone explain why 'git branch' takes forever if there are > only two non-remote branches ? That's because git-branch always traverses the complete directory tree below .git/refs, even if you only want to see the 'local' branches (I have a patch cooking to fix this). -- larsh ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH] git-branch: only traverse the requested refs 2007-10-10 21:24 ` Han-Wen Nienhuys 2007-10-10 21:30 ` Han-Wen Nienhuys 2007-10-10 21:34 ` Lars Hjemli @ 2007-10-10 21:54 ` Lars Hjemli 2007-10-10 23:00 ` Johannes Schindelin 2 siblings, 1 reply; 26+ messages in thread From: Lars Hjemli @ 2007-10-10 21:54 UTC (permalink / raw) To: Han-Wen Nienhuys; +Cc: git, Junio C Hamano This avoids looking at every single file below .git/refs when git-branch is fetching the list of refs to display. Signed-off-by: Lars Hjemli <hjemli@gmail.com> --- This patch should make git-branch much more efficient when there exists many files below .git/refs, but it does require two passes through .git/packed-refs when -a is specified. No benchmarking performed... builtin-branch.c | 28 +++++++++------------------- 1 files changed, 9 insertions(+), 19 deletions(-) diff --git a/builtin-branch.c b/builtin-branch.c index 3da8b55..466e1e0 100644 --- a/builtin-branch.c +++ b/builtin-branch.c @@ -185,25 +185,8 @@ static int append_ref(const char *refname, const unsigned char *sha1, int flags, { struct ref_list *ref_list = (struct ref_list*)(cb_data); struct ref_item *newitem; - int kind = REF_UNKNOWN_TYPE; int len; - /* Detect kind */ - if (!prefixcmp(refname, "refs/heads/")) { - kind = REF_LOCAL_BRANCH; - refname += 11; - } else if (!prefixcmp(refname, "refs/remotes/")) { - kind = REF_REMOTE_BRANCH; - refname += 13; - } else if (!prefixcmp(refname, "refs/tags/")) { - kind = REF_TAG; - refname += 10; - } - - /* Don't add types the caller doesn't want */ - if ((kind & ref_list->kinds) == 0) - return 0; - /* Resize buffer */ if (ref_list->index >= ref_list->alloc) { ref_list->alloc = alloc_nr(ref_list->alloc); @@ -214,7 +197,7 @@ static int append_ref(const char *refname, const unsigned char *sha1, int flags, /* Record the new item */ newitem = &(ref_list->list[ref_list->index++]); newitem->name = xstrdup(refname); - newitem->kind = kind; + newitem->kind = ref_list->kinds; hashcpy(newitem->sha1, sha1); len = strlen(newitem->name); if (len > ref_list->maxwidth) @@ -296,8 +279,15 @@ static void print_ref_list(int kinds, int detached, int verbose, int abbrev) struct ref_list ref_list; memset(&ref_list, 0, sizeof(ref_list)); + if (kinds & REF_LOCAL_BRANCH) { + ref_list.kinds = REF_LOCAL_BRANCH; + for_each_branch_ref(append_ref, &ref_list); + } + if (kinds & REF_REMOTE_BRANCH) { + ref_list.kinds = REF_REMOTE_BRANCH; + for_each_remote_ref(append_ref, &ref_list); + } ref_list.kinds = kinds; - for_each_ref(append_ref, &ref_list); qsort(ref_list.list, ref_list.index, sizeof(struct ref_item), ref_cmp); -- 1.5.3.4.206.g58ba4 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH] git-branch: only traverse the requested refs 2007-10-10 21:54 ` [PATCH] git-branch: only traverse the requested refs Lars Hjemli @ 2007-10-10 23:00 ` Johannes Schindelin 2007-10-10 23:30 ` Lars Hjemli 0 siblings, 1 reply; 26+ messages in thread From: Johannes Schindelin @ 2007-10-10 23:00 UTC (permalink / raw) To: Lars Hjemli; +Cc: Han-Wen Nienhuys, git, Junio C Hamano Hi, On Wed, 10 Oct 2007, Lars Hjemli wrote: > This avoids looking at every single file below .git/refs when git-branch > is fetching the list of refs to display. > > [...] > > + if (kinds & REF_LOCAL_BRANCH) { > + ref_list.kinds = REF_LOCAL_BRANCH; > + for_each_branch_ref(append_ref, &ref_list); > + } The function for_each_branch_ref() calls do_for_each_ref(), which in turn calls get_loose_refs(), which calls get_ref_dir() to read all loose refs, if they have not yet been read. So I think that your patch (unfortunately) will no help Han-Wen's situation. Ciao, Dscho ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] git-branch: only traverse the requested refs 2007-10-10 23:00 ` Johannes Schindelin @ 2007-10-10 23:30 ` Lars Hjemli 0 siblings, 0 replies; 26+ messages in thread From: Lars Hjemli @ 2007-10-10 23:30 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Han-Wen Nienhuys, git, Junio C Hamano On 10/11/07, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > On Wed, 10 Oct 2007, Lars Hjemli wrote: > > + if (kinds & REF_LOCAL_BRANCH) { > > + ref_list.kinds = REF_LOCAL_BRANCH; > > + for_each_branch_ref(append_ref, &ref_list); > > + } > > The function for_each_branch_ref() calls do_for_each_ref(), which in turn > calls get_loose_refs(), which calls get_ref_dir() to read all loose refs, > if they have not yet been read. Ok, I'll see if get_loose_refs() could take 'const char *base' and pass this on to get_ref_dir(), which should solve the problem. Thanks for noticing. -- larsh ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: git branch performance problem?
@ 2007-10-12 17:32 Salikh Zakirov
0 siblings, 0 replies; 26+ messages in thread
From: Salikh Zakirov @ 2007-10-12 17:32 UTC (permalink / raw)
To: hanwen; +Cc: Linus Torvalds, Lars Hjemli, git
Han-Wen Nienhuys wrote:
> For cherrypicking convenience, I would still appreciate it if there
> was a mechanism similar to alternates that would allow me to view
> objects from an alternate repo; objects found through this mechanism
> should never be assumed to be present in the database, of course.
There exist a script contrib/workdir/git-new-workdir,
which creates a new working copy that literally shares the same object store.
It will share both object store and branches, so some care must be taken:
branch which checkout out in one shared working directory must never be
updated
(committed or pulled into) from the other shared working directory.
Said that, I personally find this trick very useful for browsing code
from different branch and quick bug fixing.
^ permalink raw reply [flat|nested] 26+ messages in threadend of thread, other threads:[~2007-10-12 17:36 UTC | newest] Thread overview: 26+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-10-10 20:22 git branch performance problem? Han-Wen Nienhuys 2007-10-10 20:44 ` Lars Hjemli 2007-10-10 21:17 ` Han-Wen Nienhuys 2007-10-10 21:24 ` Han-Wen Nienhuys 2007-10-10 21:30 ` Han-Wen Nienhuys 2007-10-10 21:39 ` J. Bruce Fields 2007-10-10 21:45 ` Lars Hjemli 2007-10-10 21:49 ` Han-Wen Nienhuys 2007-10-10 21:53 ` J. Bruce Fields 2007-10-10 22:01 ` Han-Wen Nienhuys 2007-10-10 21:53 ` Johannes Schindelin 2007-10-10 22:55 ` Spam: " Brandon Casey 2007-10-11 9:41 ` Mike Ralphson 2007-10-11 10:58 ` Johannes Schindelin 2007-10-10 23:39 ` Linus Torvalds 2007-10-11 2:26 ` Han-Wen Nienhuys 2007-10-11 6:41 ` Alex Riesen 2007-10-11 10:46 ` Johannes Schindelin 2007-10-11 13:11 ` Han-Wen Nienhuys 2007-10-11 15:16 ` Linus Torvalds 2007-10-12 17:19 ` Salikh Zakirov 2007-10-10 21:34 ` Lars Hjemli 2007-10-10 21:54 ` [PATCH] git-branch: only traverse the requested refs Lars Hjemli 2007-10-10 23:00 ` Johannes Schindelin 2007-10-10 23:30 ` Lars Hjemli -- strict thread matches above, loose matches on Subject: below -- 2007-10-12 17:32 git branch performance problem? Salikh Zakirov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).