git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* git branch performance problem?
@ 2007-10-10 20:22 Han-Wen Nienhuys
  2007-10-10 20:44 ` Lars Hjemli
  0 siblings, 1 reply; 25+ messages in thread
From: Han-Wen Nienhuys @ 2007-10-10 20:22 UTC (permalink / raw)
  To: git

Hello,

I'm seeing very slow performance with 'git-branch'.  Is this the
canonical way to find out the current branch? ( I know I can look into
.git/HEAD, but how likely is that to break in the future?)

hanwen@lilypond:/tmp/z$ time git branch
* foo
  master

real    0m0.307s
user    0m0.232s
sys     0m0.038s

hanwen@lilypond:/tmp/z$ git --version
git version 1.5.1.rc1.949.g322bc


On NFS this takes 5 seconds. Note that I have a humongous amount of
remotes, but those should not be examined without -r, right?

hanwen@lilypond:/tmp/z$ find .git/refs/remotes | wc -l
1856

-- 
Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: git branch performance problem?
  2007-10-10 20:22 git branch performance problem? Han-Wen Nienhuys
@ 2007-10-10 20:44 ` Lars Hjemli
  2007-10-10 21:17   ` Han-Wen Nienhuys
  0 siblings, 1 reply; 25+ messages in thread
From: Lars Hjemli @ 2007-10-10 20:44 UTC (permalink / raw)
  To: hanwen; +Cc: git

On 10/10/07, Han-Wen Nienhuys <hanwenn@gmail.com> wrote:
> I'm seeing very slow performance with 'git-branch'.  Is this the
> canonical way to find out the current branch?

You could also try 'git symbolic-ref HEAD', but see below...

> hanwen@lilypond:/tmp/z$ find .git/refs/remotes | wc -l
> 1856

You probably want to run 'git gc' (which will run 'git pack-refs',
i.e. put all files currently under .git/refs into a single file). This
should speed up 'git branch' (and quite possibly other commands too).

--
larsh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: git branch performance problem?
  2007-10-10 20:44 ` Lars Hjemli
@ 2007-10-10 21:17   ` Han-Wen Nienhuys
  2007-10-10 21:24     ` Han-Wen Nienhuys
  0 siblings, 1 reply; 25+ messages in thread
From: Han-Wen Nienhuys @ 2007-10-10 21:17 UTC (permalink / raw)
  To: Lars Hjemli; +Cc: git

2007/10/10, Lars Hjemli <hjemli@gmail.com>:
> On 10/10/07, Han-Wen Nienhuys <hanwenn@gmail.com> wrote:
> > I'm seeing very slow performance with 'git-branch'.  Is this the
> > canonical way to find out the current branch?
>
> You could also try 'git symbolic-ref HEAD', but see below...
>
> > hanwen@lilypond:/tmp/z$ find .git/refs/remotes | wc -l
> > 1856
>
> You probably want to run 'git gc' (which will run 'git pack-refs',
> i.e. put all files currently under .git/refs into a single file). This
> should speed up 'git branch' (and quite possibly other commands too).

This seems rather unuseful. After running gc pack-refs --all, I lost my HEAD,

hanwen@lilypond:~/vc/git5$ git show HEAD
fatal: ambiguous argument 'HEAD': unknown revision or path not in the
working tree.
Use '--' to separate paths from revisions

Is there a way to only pack refs under a certain subdirectory of .git/refs ?
(I'm thinking of .git/refs/remotes )

-- 
Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: git branch performance problem?
  2007-10-10 21:17   ` Han-Wen Nienhuys
@ 2007-10-10 21:24     ` Han-Wen Nienhuys
  2007-10-10 21:30       ` Han-Wen Nienhuys
                         ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Han-Wen Nienhuys @ 2007-10-10 21:24 UTC (permalink / raw)
  To: Lars Hjemli; +Cc: git

2007/10/10, Han-Wen Nienhuys <hanwenn@gmail.com>:
> > You probably want to run 'git gc' (which will run 'git pack-refs',
> > i.e. put all files currently under .git/refs into a single file). This
> > should speed up 'git branch' (and quite possibly other commands too).
>
> This seems rather unuseful. After running gc pack-refs --all, I lost my HEAD,
>
> hanwen@lilypond:~/vc/git5$ git show HEAD
> fatal: ambiguous argument 'HEAD': unknown revision or path not in the
> working tree.

More to the point, I seemed to have lost my entire repository. This is
the type of surprise  I don't enjoy.

Now, can someone explain why 'git branch' takes forever if there are
only two non-remote branches ?

-- 
Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: git branch performance problem?
  2007-10-10 21:24     ` Han-Wen Nienhuys
@ 2007-10-10 21:30       ` Han-Wen Nienhuys
  2007-10-10 21:39         ` J. Bruce Fields
  2007-10-10 23:39         ` Linus Torvalds
  2007-10-10 21:34       ` Lars Hjemli
  2007-10-10 21:54       ` [PATCH] git-branch: only traverse the requested refs Lars Hjemli
  2 siblings, 2 replies; 25+ messages in thread
From: Han-Wen Nienhuys @ 2007-10-10 21:30 UTC (permalink / raw)
  To: Lars Hjemli; +Cc: git

2007/10/10, Han-Wen Nienhuys <hanwenn@gmail.com>:
> More to the point, I seemed to have lost my entire repository. This is
> the type of surprise  I don't enjoy.
>
> Now, can someone explain why 'git branch' takes forever if there are
> only two non-remote branches ?

So,

Here is a question:  I would like to share commitishes between two checkouts
of a repository. The reason for this is that I want to easily cherry
pick back and forth between the two. The files of in one of them
should be continually available, since I am running out of that
directory.

The way I solved that, was to have both repositories pointing to each
other, using alternates.

Now, after a couple of gc and pack-refs iterations, I am greeted by

hanwen@lilypond:~/vc/git6$ git fsck
missing tree 12b00ec3190f7b46a5fe0a3235445bead4c9645b
broken link from    tree 1718d09e0394d113c162e4a3471e7a1f20914a94
              to    blob 635e2802568b85017007698c0e6dd4d28dca496f
broken link from    tree 926899798fce75038e24f8fa1838f6da8bcf105f
              to    tree f1b852d270ebbaaf95d8ddc06c52763bad11ff25
missing blob 99f0c0d63276fce444e3a200167b636236784c52
missing tree f1b852d270ebbaaf95d8ddc06c52763bad11ff25
missing blob 236962a87fafae8ca2dce2dc550d344aa7a8884a
missing blob 7d69ca297f392a954c4cdcb62bb4c8a90ddb862b
missing blob 9e39be8f5cb4eeff97fcfd6eb77fefeda02f0e71
dangling blob f3a93f023080ce9fc6becb397e366cc4ceb192f5


could it be that GC does not handle cyclic alternates correctly?

-- 
Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: git branch performance problem?
  2007-10-10 21:24     ` Han-Wen Nienhuys
  2007-10-10 21:30       ` Han-Wen Nienhuys
@ 2007-10-10 21:34       ` Lars Hjemli
  2007-10-10 21:54       ` [PATCH] git-branch: only traverse the requested refs Lars Hjemli
  2 siblings, 0 replies; 25+ messages in thread
From: Lars Hjemli @ 2007-10-10 21:34 UTC (permalink / raw)
  To: hanwen; +Cc: git

On 10/10/07, Han-Wen Nienhuys <hanwenn@gmail.com> wrote:
> 2007/10/10, Han-Wen Nienhuys <hanwenn@gmail.com>:
> > > You probably want to run 'git gc' (which will run 'git pack-refs',
> > > i.e. put all files currently under .git/refs into a single file). This
> > > should speed up 'git branch' (and quite possibly other commands too).
> >
> > This seems rather unuseful. After running gc pack-refs --all, I lost my HEAD,
> >
> > hanwen@lilypond:~/vc/git5$ git show HEAD
> > fatal: ambiguous argument 'HEAD': unknown revision or path not in the
> > working tree.
>
> More to the point, I seemed to have lost my entire repository. This is
> the type of surprise  I don't enjoy.

Yeah, this is bad, I'm sorry to have caused you trouble. But I fail to
see how 'git pack-refs --all' could possibly trash your repository. A
few questions:

What version of git are you using?
What's the output from these commands:
$ cat .git/packed-refs
$ cat .git/HEAD
$ find .git/refs -type f | wc -l

> Now, can someone explain why 'git branch' takes forever if there are
> only two non-remote branches ?

That's because git-branch always traverses the complete directory tree
below .git/refs, even if you only want to see the 'local' branches (I
have a patch cooking to fix this).

--
larsh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: git branch performance problem?
  2007-10-10 21:30       ` Han-Wen Nienhuys
@ 2007-10-10 21:39         ` J. Bruce Fields
  2007-10-10 21:45           ` Lars Hjemli
  2007-10-10 23:39         ` Linus Torvalds
  1 sibling, 1 reply; 25+ messages in thread
From: J. Bruce Fields @ 2007-10-10 21:39 UTC (permalink / raw)
  To: hanwen; +Cc: Lars Hjemli, git

On Wed, Oct 10, 2007 at 06:30:02PM -0300, Han-Wen Nienhuys wrote:
> 2007/10/10, Han-Wen Nienhuys <hanwenn@gmail.com>:
> > More to the point, I seemed to have lost my entire repository. This is
> > the type of surprise  I don't enjoy.
> >
> > Now, can someone explain why 'git branch' takes forever if there are
> > only two non-remote branches ?
> 
> So,
> 
> Here is a question:  I would like to share commitishes between two checkouts
> of a repository. The reason for this is that I want to easily cherry
> pick back and forth between the two. The files of in one of them
> should be continually available, since I am running out of that
> directory.
> 
> The way I solved that, was to have both repositories pointing to each
> other, using alternates.
> 
> Now, after a couple of gc and pack-refs iterations, I am greeted by
> 
> hanwen@lilypond:~/vc/git6$ git fsck
> missing tree 12b00ec3190f7b46a5fe0a3235445bead4c9645b
> broken link from    tree 1718d09e0394d113c162e4a3471e7a1f20914a94
>               to    blob 635e2802568b85017007698c0e6dd4d28dca496f
> broken link from    tree 926899798fce75038e24f8fa1838f6da8bcf105f
>               to    tree f1b852d270ebbaaf95d8ddc06c52763bad11ff25
> missing blob 99f0c0d63276fce444e3a200167b636236784c52
> missing tree f1b852d270ebbaaf95d8ddc06c52763bad11ff25
> missing blob 236962a87fafae8ca2dce2dc550d344aa7a8884a
> missing blob 7d69ca297f392a954c4cdcb62bb4c8a90ddb862b
> missing blob 9e39be8f5cb4eeff97fcfd6eb77fefeda02f0e71
> dangling blob f3a93f023080ce9fc6becb397e366cc4ceb192f5
> 
> 
> could it be that GC does not handle cyclic alternates correctly?

Does it handle alternates at all?  If you run git-gc on a repository
which other repositories get objects from, then my impression was that
bad things happen.

--b.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: git branch performance problem?
  2007-10-10 21:39         ` J. Bruce Fields
@ 2007-10-10 21:45           ` Lars Hjemli
  2007-10-10 21:49             ` Han-Wen Nienhuys
  2007-10-10 22:55             ` Spam: " Brandon Casey
  0 siblings, 2 replies; 25+ messages in thread
From: Lars Hjemli @ 2007-10-10 21:45 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: hanwen, git

On 10/10/07, J. Bruce Fields <bfields@fieldses.org> wrote:
> On Wed, Oct 10, 2007 at 06:30:02PM -0300, Han-Wen Nienhuys wrote:
> > could it be that GC does not handle cyclic alternates correctly?
>
> Does it handle alternates at all?  If you run git-gc on a repository
> which other repositories get objects from, then my impression was that
> bad things happen.
>

AFAIK 'git gc' is safe, while 'git gc --prune' will remove loose
(unreferenced) objects.

-- 
larsh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: git branch performance problem?
  2007-10-10 21:45           ` Lars Hjemli
@ 2007-10-10 21:49             ` Han-Wen Nienhuys
  2007-10-10 21:53               ` J. Bruce Fields
  2007-10-10 21:53               ` Johannes Schindelin
  2007-10-10 22:55             ` Spam: " Brandon Casey
  1 sibling, 2 replies; 25+ messages in thread
From: Han-Wen Nienhuys @ 2007-10-10 21:49 UTC (permalink / raw)
  To: Lars Hjemli; +Cc: J. Bruce Fields, git

2007/10/10, Lars Hjemli <hjemli@gmail.com>:
> On 10/10/07, J. Bruce Fields <bfields@fieldses.org> wrote:
> > On Wed, Oct 10, 2007 at 06:30:02PM -0300, Han-Wen Nienhuys wrote:
> > > could it be that GC does not handle cyclic alternates correctly?
> >
> > Does it handle alternates at all?  If you run git-gc on a repository
> > which other repositories get objects from, then my impression was that
> > bad things happen.
> >
>
> AFAIK 'git gc' is safe, while 'git gc --prune' will remove loose
> (unreferenced) objects.

Yes, I think that in this case, gc --prune was run accidentally, but
given that the history of the program invoking git just died, I'm not
sure how to figure that out.

Maybe gc --prune could follow the alternates and abort if a cycle was detected?

-- 
Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: git branch performance problem?
  2007-10-10 21:49             ` Han-Wen Nienhuys
@ 2007-10-10 21:53               ` J. Bruce Fields
  2007-10-10 22:01                 ` Han-Wen Nienhuys
  2007-10-10 21:53               ` Johannes Schindelin
  1 sibling, 1 reply; 25+ messages in thread
From: J. Bruce Fields @ 2007-10-10 21:53 UTC (permalink / raw)
  To: hanwen; +Cc: Lars Hjemli, git

On Wed, Oct 10, 2007 at 06:49:19PM -0300, Han-Wen Nienhuys wrote:
> 2007/10/10, Lars Hjemli <hjemli@gmail.com>:
> > On 10/10/07, J. Bruce Fields <bfields@fieldses.org> wrote:
> > > On Wed, Oct 10, 2007 at 06:30:02PM -0300, Han-Wen Nienhuys wrote:
> > > > could it be that GC does not handle cyclic alternates correctly?
> > >
> > > Does it handle alternates at all?  If you run git-gc on a repository
> > > which other repositories get objects from, then my impression was that
> > > bad things happen.
> > >
> >
> > AFAIK 'git gc' is safe, while 'git gc --prune' will remove loose
> > (unreferenced) objects.
> 
> Yes, I think that in this case, gc --prune was run accidentally, but
> given that the history of the program invoking git just died, I'm not
> sure how to figure that out.
> 
> Maybe gc --prune could follow the alternates and abort if a cycle was detected?

Don't the alternates point in the wrong direction?  You'd need pointers
back from the main repository to the repositories that depend on it for
objects.

Which would be nice....

--b.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: git branch performance problem?
  2007-10-10 21:49             ` Han-Wen Nienhuys
  2007-10-10 21:53               ` J. Bruce Fields
@ 2007-10-10 21:53               ` Johannes Schindelin
  1 sibling, 0 replies; 25+ messages in thread
From: Johannes Schindelin @ 2007-10-10 21:53 UTC (permalink / raw)
  To: hanwen; +Cc: Lars Hjemli, J. Bruce Fields, git

Hi,

On Wed, 10 Oct 2007, Han-Wen Nienhuys wrote:

> 2007/10/10, Lars Hjemli <hjemli@gmail.com>:
> > On 10/10/07, J. Bruce Fields <bfields@fieldses.org> wrote:
> > > On Wed, Oct 10, 2007 at 06:30:02PM -0300, Han-Wen Nienhuys wrote:
> > > > could it be that GC does not handle cyclic alternates correctly?
> > >
> > > Does it handle alternates at all?  If you run git-gc on a repository 
> > > which other repositories get objects from, then my impression was 
> > > that bad things happen.
> > >
> >
> > AFAIK 'git gc' is safe, while 'git gc --prune' will remove loose
> > (unreferenced) objects.
> 
> Yes, I think that in this case, gc --prune was run accidentally, but
> given that the history of the program invoking git just died, I'm not
> sure how to figure that out.
> 
> Maybe gc --prune could follow the alternates and abort if a cycle was 
> detected?

I think we talked about this quite some time ago, and the resolution was 
that it is too hard.

Now that it bit somebody in real life, I think we have to try harder.

And probably the best place to check would be git-prune, not git-gc, since 
that is the program (called by gc) that most probably killed your repo.

Come to think of it, it should probably be part of git-repack, too.

Will try to cobble up a patch,
Dscho

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] git-branch: only traverse the requested refs
  2007-10-10 21:24     ` Han-Wen Nienhuys
  2007-10-10 21:30       ` Han-Wen Nienhuys
  2007-10-10 21:34       ` Lars Hjemli
@ 2007-10-10 21:54       ` Lars Hjemli
  2007-10-10 23:00         ` Johannes Schindelin
  2 siblings, 1 reply; 25+ messages in thread
From: Lars Hjemli @ 2007-10-10 21:54 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: git, Junio C Hamano

This avoids looking at every single file below .git/refs when git-branch
is fetching the list of refs to display.

Signed-off-by: Lars Hjemli <hjemli@gmail.com>
---

This patch should make git-branch much more efficient when there exists
many files below .git/refs, but it does require two passes through
.git/packed-refs when -a is specified.

No benchmarking performed...

 builtin-branch.c |   28 +++++++++-------------------
 1 files changed, 9 insertions(+), 19 deletions(-)

diff --git a/builtin-branch.c b/builtin-branch.c
index 3da8b55..466e1e0 100644
--- a/builtin-branch.c
+++ b/builtin-branch.c
@@ -185,25 +185,8 @@ static int append_ref(const char *refname, const unsigned char *sha1, int flags,
 {
 	struct ref_list *ref_list = (struct ref_list*)(cb_data);
 	struct ref_item *newitem;
-	int kind = REF_UNKNOWN_TYPE;
 	int len;
 
-	/* Detect kind */
-	if (!prefixcmp(refname, "refs/heads/")) {
-		kind = REF_LOCAL_BRANCH;
-		refname += 11;
-	} else if (!prefixcmp(refname, "refs/remotes/")) {
-		kind = REF_REMOTE_BRANCH;
-		refname += 13;
-	} else if (!prefixcmp(refname, "refs/tags/")) {
-		kind = REF_TAG;
-		refname += 10;
-	}
-
-	/* Don't add types the caller doesn't want */
-	if ((kind & ref_list->kinds) == 0)
-		return 0;
-
 	/* Resize buffer */
 	if (ref_list->index >= ref_list->alloc) {
 		ref_list->alloc = alloc_nr(ref_list->alloc);
@@ -214,7 +197,7 @@ static int append_ref(const char *refname, const unsigned char *sha1, int flags,
 	/* Record the new item */
 	newitem = &(ref_list->list[ref_list->index++]);
 	newitem->name = xstrdup(refname);
-	newitem->kind = kind;
+	newitem->kind = ref_list->kinds;
 	hashcpy(newitem->sha1, sha1);
 	len = strlen(newitem->name);
 	if (len > ref_list->maxwidth)
@@ -296,8 +279,15 @@ static void print_ref_list(int kinds, int detached, int verbose, int abbrev)
 	struct ref_list ref_list;
 
 	memset(&ref_list, 0, sizeof(ref_list));
+	if (kinds & REF_LOCAL_BRANCH) {
+		ref_list.kinds = REF_LOCAL_BRANCH;
+		for_each_branch_ref(append_ref, &ref_list);
+	}
+	if (kinds & REF_REMOTE_BRANCH) {
+		ref_list.kinds = REF_REMOTE_BRANCH;
+		for_each_remote_ref(append_ref, &ref_list);
+	}
 	ref_list.kinds = kinds;
-	for_each_ref(append_ref, &ref_list);
 
 	qsort(ref_list.list, ref_list.index, sizeof(struct ref_item), ref_cmp);
 
-- 
1.5.3.4.206.g58ba4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: git branch performance problem?
  2007-10-10 21:53               ` J. Bruce Fields
@ 2007-10-10 22:01                 ` Han-Wen Nienhuys
  0 siblings, 0 replies; 25+ messages in thread
From: Han-Wen Nienhuys @ 2007-10-10 22:01 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Lars Hjemli, git

2007/10/10, J. Bruce Fields <bfields@fieldses.org>:
> > Maybe gc --prune could follow the alternates and abort if a cycle was detected?
>
> Don't the alternates point in the wrong direction?  You'd need pointers
> back from the main repository to the repositories that depend on it for
> objects.
>
> Which would be nice....

The development repo was cloned from the main repo; then sometimes I
cherry pick from development into the main repo. Hence alternates in 2
directions.

-- 
Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Spam: Re: git branch performance problem?
  2007-10-10 21:45           ` Lars Hjemli
  2007-10-10 21:49             ` Han-Wen Nienhuys
@ 2007-10-10 22:55             ` Brandon Casey
  2007-10-11  9:41               ` Mike Ralphson
  1 sibling, 1 reply; 25+ messages in thread
From: Brandon Casey @ 2007-10-10 22:55 UTC (permalink / raw)
  To: Lars Hjemli; +Cc: J. Bruce Fields, hanwen, git

Lars Hjemli wrote:
> On 10/10/07, J. Bruce Fields <bfields@fieldses.org> wrote:
>> On Wed, Oct 10, 2007 at 06:30:02PM -0300, Han-Wen Nienhuys wrote:
>>> could it be that GC does not handle cyclic alternates correctly?
>> Does it handle alternates at all?  If you run git-gc on a repository
>> which other repositories get objects from, then my impression was that
>> bad things happen.
>>
> 
> AFAIK 'git gc' is safe, while 'git gc --prune' will remove loose
> (unreferenced) objects.

No, this is not the case, unless something has changed very recently
in git-gc or git-repack. Even git-gc with no arguments is unsafe if
the repository being gc'ed is listed in another's alternates.

git-gc calls repack with -a and -d. which causes a new pack to be
created which only contains the objects required by the local repository.
The other packs are then deleted. Objects contained in those packs and
required by a "sharing" repository (one using the alternates mechanism)
will be deleted if the local repository no longer references them.

Maybe git-gc should make use of repack's new -A option by default and
only use -a (and not -A) when --prune is specified...

-brandon

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] git-branch: only traverse the requested refs
  2007-10-10 21:54       ` [PATCH] git-branch: only traverse the requested refs Lars Hjemli
@ 2007-10-10 23:00         ` Johannes Schindelin
  2007-10-10 23:30           ` Lars Hjemli
  0 siblings, 1 reply; 25+ messages in thread
From: Johannes Schindelin @ 2007-10-10 23:00 UTC (permalink / raw)
  To: Lars Hjemli; +Cc: Han-Wen Nienhuys, git, Junio C Hamano

Hi,

On Wed, 10 Oct 2007, Lars Hjemli wrote:

> This avoids looking at every single file below .git/refs when git-branch 
> is fetching the list of refs to display.
> 
> [...]
>
> +	if (kinds & REF_LOCAL_BRANCH) {
> +		ref_list.kinds = REF_LOCAL_BRANCH;
> +		for_each_branch_ref(append_ref, &ref_list);
> +	}

The function for_each_branch_ref() calls do_for_each_ref(), which in turn 
calls get_loose_refs(), which calls get_ref_dir() to read all loose refs, 
if they have not yet been read.

So I think that your patch (unfortunately) will no help Han-Wen's 
situation.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] git-branch: only traverse the requested refs
  2007-10-10 23:00         ` Johannes Schindelin
@ 2007-10-10 23:30           ` Lars Hjemli
  0 siblings, 0 replies; 25+ messages in thread
From: Lars Hjemli @ 2007-10-10 23:30 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Han-Wen Nienhuys, git, Junio C Hamano

On 10/11/07, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> On Wed, 10 Oct 2007, Lars Hjemli wrote:
> > +     if (kinds & REF_LOCAL_BRANCH) {
> > +             ref_list.kinds = REF_LOCAL_BRANCH;
> > +             for_each_branch_ref(append_ref, &ref_list);
> > +     }
>
> The function for_each_branch_ref() calls do_for_each_ref(), which in turn
> calls get_loose_refs(), which calls get_ref_dir() to read all loose refs,
> if they have not yet been read.

Ok, I'll see if get_loose_refs() could take 'const char *base' and
pass this on to get_ref_dir(), which should solve the problem.

Thanks for noticing.

-- 
larsh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: git branch performance problem?
  2007-10-10 21:30       ` Han-Wen Nienhuys
  2007-10-10 21:39         ` J. Bruce Fields
@ 2007-10-10 23:39         ` Linus Torvalds
  2007-10-11  2:26           ` Han-Wen Nienhuys
  1 sibling, 1 reply; 25+ messages in thread
From: Linus Torvalds @ 2007-10-10 23:39 UTC (permalink / raw)
  To: hanwen; +Cc: Lars Hjemli, git



On Wed, 10 Oct 2007, Han-Wen Nienhuys wrote:
> 
> The way I solved that, was to have both repositories pointing to each
> other, using alternates.

Ouch. Double un-good. Not a good idea. Especially not if you do 
development in both and pull and push between them. 

What will happen is that if you do alternates pointing both ways, you 
basically end up having a "shared pool of objects". So it's pretty much 
equivalent to just using a shared object directory, and it has *exactly* 
the same issues with object reachability and references: you have a shared 
pool of objects, but you only ever see *one* set of references, so garbage 
collection cannot work - because it will always see just a subset of the 
real references, while it sees essentially all objects.

> could it be that GC does not handle cyclic alternates correctly?

It's not about cyclic per se: it's about the fact that GC will do garbage 
collection based on reachability with the local references.

Which is normally fine. It's normally fine, because the object tree is 
"local" too. But when doing alternates:

 - the tree that is being used as an alternate *has* to be totally stable. 
   It must *never* have been re-based, or have any GC'able objects in the 
   first place. IOW, doing a "git gc" on it will be safe, because there is 
   no way any objects that the other alternate depends on could be pruned.

 - You definitely must *not* do a two-way alternate, because that violates 
   another rule: the rule that the "alternate base" (which is now *both*
   of the repositories) is self-sufficient. Since they both point to each 
   other, there's no way to know whether they are self-sufficient or not: 
   they may be re-using each others objects *and* packs!

And in the above, the "*and* packs" is important, and probably the cause 
of your problems. Because "git repack -a -d -l" (which is what "git gc" 
does) will always gather up any loose objects even from remote sites, but 
the "-l" means that it will not do so for alternate packed objects.

So what happens is that if one of the repositories can reach some object 
that is in a pack in the other repository, "git gc" will still *leave* it 
dependent on a pack in the other repository. But maybe that object isn't 
even reachable in the other repo any more (for whatever reason - a rebase, 
whatever), then when you repack the other repository, now all the packs 
will be replaced by one new pack - and the one new pack will only contain 
the objects reachable from the other repo.

IOW: alternates are dangerous. A shared object directory is dangerous. You 
should basically only do it under very controlled circumstances, and 
otherwise you should use either hardlinks or if you want added safety, 
totally separate repositories.

Basically, here's an example of badness, with A and B being repos that 
point to each other.

 - do something in A
 - pull it into B - this leaves the objects in A, because of the 
   alternates link.
 - rebase A
 - "git gc" in A: this removes unreachable objects from A, and now B is 
   screwed.

So the rule really is: never *ever* do anything but fast-forward in a repo 
that is an alternate for another one. If you do a circular link, I think 
it's still safe if you follow that rule, but now obviously the rule holds 
for *both* repos (and quite frankly, I'd worry so much that I'd never do 
it even then).

There should be another rule too: git on its own is not a backup system. 
You can use git *as* a backup system, but you need to do so by mirroring 
the whole repository, and not on the same disk.

(ie, for me, git *is* a backup system, but that's only because I push my 
repos to other sites - a single git repo on its own has zero redundancy)

		Linus

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: git branch performance problem?
  2007-10-10 23:39         ` Linus Torvalds
@ 2007-10-11  2:26           ` Han-Wen Nienhuys
  2007-10-11  6:41             ` Alex Riesen
                               ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Han-Wen Nienhuys @ 2007-10-11  2:26 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Lars Hjemli, git

2007/10/10, Linus Torvalds <torvalds@linux-foundation.org>:

> IOW: alternates are dangerous. A shared object directory is dangerous. You
> should basically only do it under very controlled circumstances, and
> otherwise you should use either hardlinks or if you want added safety,
> totally separate repositories.

I recall reading a few months ago that it was "clone -l" that gave you
the jeebies, rather than "clone -s".


> So the rule really is: never *ever* do anything but fast-forward in a repo
>[..]

Methinks this is all too difficult. I will use clone -l henceforth. Is
there any reason to prefer -s over -l? Given your lengthy exposition
on the dangers of alternates, I would say this is a features that
deserves to be buried or at least deemphasized in the documentation.

For cherrypicking convenience, I would still appreciate it if there
was a mechanism similar to alternates that would allow me to view
objects from an alternate repo; objects found through this mechanism
should never be assumed to be present in the database, of course.


-- 
Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: git branch performance problem?
  2007-10-11  2:26           ` Han-Wen Nienhuys
@ 2007-10-11  6:41             ` Alex Riesen
  2007-10-11 10:46             ` Johannes Schindelin
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 25+ messages in thread
From: Alex Riesen @ 2007-10-11  6:41 UTC (permalink / raw)
  To: hanwen; +Cc: Linus Torvalds, Lars Hjemli, git

Han-Wen Nienhuys, Thu, Oct 11, 2007 04:26:24 +0200:
> > So the rule really is: never *ever* do anything but fast-forward in a repo
> >[..]
> 
> Methinks this is all too difficult. I will use clone -l henceforth.

It is current default for local clones

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Spam: Re: git branch performance problem?
  2007-10-10 22:55             ` Spam: " Brandon Casey
@ 2007-10-11  9:41               ` Mike Ralphson
  2007-10-11 10:58                 ` Johannes Schindelin
  0 siblings, 1 reply; 25+ messages in thread
From: Mike Ralphson @ 2007-10-11  9:41 UTC (permalink / raw)
  To: Brandon Casey; +Cc: Lars Hjemli, J. Bruce Fields, hanwen, git

On 10/10/07, Brandon Casey <casey@nrlssc.navy.mil> wrote:
> No, this is not the case, unless something has changed very recently
> in git-gc or git-repack. Even git-gc with no arguments is unsafe if
> the repository being gc'ed is listed in another's alternates.
>
> git-gc calls repack with -a and -d. which causes a new pack to be
> created which only contains the objects required by the local repository.
> The other packs are then deleted. Objects contained in those packs and
> required by a "sharing" repository (one using the alternates mechanism)
> will be deleted if the local repository no longer references them.

It's not something I've really looked into, but there seems to be a
reflogs mechanism which can temporarily pin an otherwise unreferenced
object so it doesn't get deleted. Would it be possible to populate the
remote's view of referenced objects into this, at the point of clone,
push or pull, which would seem to be the points at which this might be
changing.

Obviously this is of no use if you're 'anonymously' poncing off a
third repo to save clone time, but if you're in control of both repo's
it might be useful.

Mike

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: git branch performance problem?
  2007-10-11  2:26           ` Han-Wen Nienhuys
  2007-10-11  6:41             ` Alex Riesen
@ 2007-10-11 10:46             ` Johannes Schindelin
  2007-10-11 13:11               ` Han-Wen Nienhuys
  2007-10-11 15:16             ` Linus Torvalds
  2007-10-12 17:19             ` Salikh Zakirov
  3 siblings, 1 reply; 25+ messages in thread
From: Johannes Schindelin @ 2007-10-11 10:46 UTC (permalink / raw)
  To: hanwen; +Cc: Linus Torvalds, Lars Hjemli, git

Hi,

On Wed, 10 Oct 2007, Han-Wen Nienhuys wrote:

> For cherrypicking convenience, I would still appreciate it if there was 
> a mechanism similar to alternates that would allow me to view objects 
> from an alternate repo; objects found through this mechanism should 
> never be assumed to be present in the database, of course.

Silly question: why don't you just

	git remote add -f other <url>

and then review the changes with "git log", "git diff" and "git show"?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Spam: Re: git branch performance problem?
  2007-10-11  9:41               ` Mike Ralphson
@ 2007-10-11 10:58                 ` Johannes Schindelin
  0 siblings, 0 replies; 25+ messages in thread
From: Johannes Schindelin @ 2007-10-11 10:58 UTC (permalink / raw)
  To: Mike Ralphson; +Cc: Brandon Casey, Lars Hjemli, J. Bruce Fields, hanwen, git

Hi,

On Thu, 11 Oct 2007, Mike Ralphson wrote:

> It's not something I've really looked into, but there seems to be a
> reflogs mechanism which can temporarily pin an otherwise unreferenced
> object so it doesn't get deleted. Would it be possible to populate the
> remote's view of referenced objects into this, at the point of clone,
> push or pull, which would seem to be the points at which this might be
> changing.
> 
> Obviously this is of no use if you're 'anonymously' poncing off a
> third repo to save clone time, but if you're in control of both repo's
> it might be useful.

I cannot really allege that I understood what you were trying to say, but 
I guess you want to use clone to get rid of objects you just threw out by 
either filter-branch or deleting a branch.

The answer is that the file:// as well as the git:// protocol will do 
that.  For local clones, they are not the default, since they are slower 
than hardlinking.

Hth,
Dscho

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: git branch performance problem?
  2007-10-11 10:46             ` Johannes Schindelin
@ 2007-10-11 13:11               ` Han-Wen Nienhuys
  0 siblings, 0 replies; 25+ messages in thread
From: Han-Wen Nienhuys @ 2007-10-11 13:11 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Linus Torvalds, Lars Hjemli, git

2007/10/11, Johannes Schindelin <Johannes.Schindelin@gmx.de>:
> > For cherrypicking convenience, I would still appreciate it if there was
> > a mechanism similar to alternates that would allow me to view objects
> > from an alternate repo; objects found through this mechanism should
> > never be assumed to be present in the database, of course.
>
> Silly question: why don't you just
>
>         git remote add -f other <url>
>
> and then review the changes with "git log", "git diff" and "git show"?

Thank for the tip; I'll look into it.

-- 
Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: git branch performance problem?
  2007-10-11  2:26           ` Han-Wen Nienhuys
  2007-10-11  6:41             ` Alex Riesen
  2007-10-11 10:46             ` Johannes Schindelin
@ 2007-10-11 15:16             ` Linus Torvalds
  2007-10-12 17:19             ` Salikh Zakirov
  3 siblings, 0 replies; 25+ messages in thread
From: Linus Torvalds @ 2007-10-11 15:16 UTC (permalink / raw)
  To: hanwen; +Cc: Lars Hjemli, git



On Wed, 10 Oct 2007, Han-Wen Nienhuys wrote:
> 
> I recall reading a few months ago that it was "clone -l" that gave you
> the jeebies, rather than "clone -s".

Yes, "clone -l" gives me the jeebies, because I'm a totally anal person 
when it comes to disk corruption and a worry-wart. I've just had it happen 
too many times (usually because a disk simply goes bad), and "git clone 
-l" basically means that if one repository gets corrupted, then so does 
the other one.

But clone -s gives me even *more* jeebies, although I think it's in some 
respect also more useful. The alternates thing is really useful for 
servers in particular, where you basically want to have multiple 
"branches" maintained by lots of people, but all based on some expected 
base version.

So if you think of alternates as a "kernel.org" or "repo.or.cz" thing, 
where you might have a hundred different repositories all based on the 
same "standard" version, then I think you basically have the right model. 
In that situation, "git clone -l" doesn't work that well, since the 
repositories just start out sharing data, but don't do it long term.

So "git clone -l" (which is the default now - my jeebies really are my 
personal psychological problem) is really useful for latency reasons for a 
local clone, and has basically no real downsides. It's not useful for 
*backups*, but it's useful for development.

> > So the rule really is: never *ever* do anything but fast-forward in a repo
> >[..]
> 
> Methinks this is all too difficult. I will use clone -l henceforth. Is
> there any reason to prefer -s over -l?

Good. And no, for actual *development* there is no reason to prefer -s 
over -l (and as mentioned, '-l' is the default in modern versions).

For a git *server* setup, -s is better, since it's more long-term. But in 
that situation, it also requires that the server maintainer have some 
rules (ie only use "-s" for stable base trees and/or use extra care when 
repacking the base).

> Given your lengthy exposition on the dangers of alternates, I would say 
> this is a features that deserves to be buried or at least deemphasized 
> in the documentation.

I do agree. We should make the dangers very clear.

> For cherrypicking convenience, I would still appreciate it if there
> was a mechanism similar to alternates that would allow me to view
> objects from an alternate repo; objects found through this mechanism
> should never be assumed to be present in the database, of course.

Well, the way that really should work is that you "git fetch remote" and 
work on the end result in a "remote branch".

That *will* make the objects present in the database, but not in your 
actual branches (until you cherry-pick), but there really are no real 
downsides. If the remote is truly related to your local tree, it all 
delta's so well that the disk space issues should basically be none.

		Linus

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: git branch performance problem?
  2007-10-11  2:26           ` Han-Wen Nienhuys
                               ` (2 preceding siblings ...)
  2007-10-11 15:16             ` Linus Torvalds
@ 2007-10-12 17:19             ` Salikh Zakirov
  3 siblings, 0 replies; 25+ messages in thread
From: Salikh Zakirov @ 2007-10-12 17:19 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds, Lars Hjemli, git

Han-Wen Nienhuys wrote:
> For cherrypicking convenience, I would still appreciate it if there
> was a mechanism similar to alternates that would allow me to view
> objects from an alternate repo; objects found through this mechanism
> should never be assumed to be present in the database, of course.

There exist a script contrib/workdir/git-new-workdir,
which creates a new working copy that literally shares the same object store.
It will share both object store and branches, so some care must be taken:
branch which checkout out in one shared working directory must never be updated
(committed or pulled into) from the other shared working directory.

Said that, I personally find this trick very useful for browsing alternate
branch code and quick bug fixing.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2007-10-12 17:36 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-10 20:22 git branch performance problem? Han-Wen Nienhuys
2007-10-10 20:44 ` Lars Hjemli
2007-10-10 21:17   ` Han-Wen Nienhuys
2007-10-10 21:24     ` Han-Wen Nienhuys
2007-10-10 21:30       ` Han-Wen Nienhuys
2007-10-10 21:39         ` J. Bruce Fields
2007-10-10 21:45           ` Lars Hjemli
2007-10-10 21:49             ` Han-Wen Nienhuys
2007-10-10 21:53               ` J. Bruce Fields
2007-10-10 22:01                 ` Han-Wen Nienhuys
2007-10-10 21:53               ` Johannes Schindelin
2007-10-10 22:55             ` Spam: " Brandon Casey
2007-10-11  9:41               ` Mike Ralphson
2007-10-11 10:58                 ` Johannes Schindelin
2007-10-10 23:39         ` Linus Torvalds
2007-10-11  2:26           ` Han-Wen Nienhuys
2007-10-11  6:41             ` Alex Riesen
2007-10-11 10:46             ` Johannes Schindelin
2007-10-11 13:11               ` Han-Wen Nienhuys
2007-10-11 15:16             ` Linus Torvalds
2007-10-12 17:19             ` Salikh Zakirov
2007-10-10 21:34       ` Lars Hjemli
2007-10-10 21:54       ` [PATCH] git-branch: only traverse the requested refs Lars Hjemli
2007-10-10 23:00         ` Johannes Schindelin
2007-10-10 23:30           ` Lars Hjemli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).