git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Restraining git pull/fetch to the current branch
@ 2007-01-11 21:47 Julian Phillips
  2007-01-12  0:59 ` Shawn O. Pearce
  2007-01-12  1:09 ` Junio C Hamano
  0 siblings, 2 replies; 5+ messages in thread
From: Julian Phillips @ 2007-01-11 21:47 UTC (permalink / raw)
  To: git

While trying out git on a large repository (10000s of commits, 1000s of 
branches, ~2.5Gb when packed) at work I noticed that doing a pull was 
taking a long time (longer than I was prepared to wait anyway).

A quick test showed that a small repository (1 commit, 24k .git/objects) 
with 1000 branches took 1m30 to do "git pull" (local xfs partition).  I 
don't know if this is reasonable or not, but all I actually cared 
about was updating the current branch, which "git pull origin 
<branch_name>" did in 0.3s.

So what I would like to know is: is there any way to make a pull/fetch 
with no options default to only fetching the current branch? (other than 
scripting "git pull/fetch origin $(git symbolic-ref HEAD)" that is)

TIA

-- 
Julian

  ---
You can't go home again, unless you set $HOME.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Restraining git pull/fetch to the current branch
  2007-01-11 21:47 Restraining git pull/fetch to the current branch Julian Phillips
@ 2007-01-12  0:59 ` Shawn O. Pearce
  2007-01-12  1:09 ` Junio C Hamano
  1 sibling, 0 replies; 5+ messages in thread
From: Shawn O. Pearce @ 2007-01-12  0:59 UTC (permalink / raw)
  To: Julian Phillips; +Cc: git

Julian Phillips <julian@quantumfyre.co.uk> wrote:
> While trying out git on a large repository (10000s of commits, 1000s of 
> branches, ~2.5Gb when packed) at work I noticed that doing a pull was 
> taking a long time (longer than I was prepared to wait anyway).
> 
> A quick test showed that a small repository (1 commit, 24k .git/objects) 
> with 1000 branches took 1m30 to do "git pull" (local xfs partition).  I 
> don't know if this is reasonable or not, but all I actually cared 
> about was updating the current branch, which "git pull origin 
> <branch_name>" did in 0.3s.
> 
> So what I would like to know is: is there any way to make a pull/fetch 
> with no options default to only fetching the current branch? (other than 
> scripting "git pull/fetch origin $(git symbolic-ref HEAD)" that is)

No, but fortunately bash has a fancy alias tool:

	alias gp='git pull origin $(git symbolic-ref HEAD)'

perhaps your shell can help.  :-)


Life is going to be painful with that repository with current Git
(1.5.0 and later) as the new default configuration for a clone is to
copy every branch into refs/remotes/origin/*, where * is wildcarded
against the current set of branches on the remote repository.
If that takes "a long time" you will be processing a lot of refs
you don't care about (or need to care about).

-- 
Shawn.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Restraining git pull/fetch to the current branch
  2007-01-11 21:47 Restraining git pull/fetch to the current branch Julian Phillips
  2007-01-12  0:59 ` Shawn O. Pearce
@ 2007-01-12  1:09 ` Junio C Hamano
  2007-01-12 14:08   ` Julian Phillips
  1 sibling, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2007-01-12  1:09 UTC (permalink / raw)
  To: Julian Phillips; +Cc: git

Julian Phillips <julian@quantumfyre.co.uk> writes:

> While trying out git on a large repository (10000s of commits, 1000s
> of branches, ~2.5Gb when packed) at work I noticed that doing a pull
> was taking a long time (longer than I was prepared to wait anyway).

Are they all real branches?  In other words, does your project
have 1000s of active parallel development?

> So what I would like to know is: is there any way to make a pull/fetch
> with no options default to only fetching the current branch? (other
> than scripting "git pull/fetch origin $(git symbolic-ref HEAD)" that
> is)

Also, assuming the answer to the above question is yes, will you
have 1000s of branches on your end and will work on any one of
them?

The default configuration created by git-clone makes you track
all branches from the remote side by putting:

	remote.origin.fetch = +refs/heads/*:refs/remotes/origin/*

If you do not care all 1000s branches but only are interested in
selected few, you could change that configuration to suit your
needs better.

    remote.origin.fetch = +refs/heads/stable:refs/remotes/origin/stable
    remote.origin.fetch = +refs/heads/testing:refs/remotes/origin/testing
    remote.origin.fetch = +refs/heads/unstable:refs/remotes/origin/unstable


I suspect most of the time is being spent in the
append-fetch-head loop in fetch_main shell function in
git-fetch.sh The true fix would not be to limit the number of
branches updated, but to speed that part of the code up.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Restraining git pull/fetch to the current branch
  2007-01-12  1:09 ` Junio C Hamano
@ 2007-01-12 14:08   ` Julian Phillips
  2007-01-15 13:06     ` Julian Phillips
  0 siblings, 1 reply; 5+ messages in thread
From: Julian Phillips @ 2007-01-12 14:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, 11 Jan 2007, Junio C Hamano wrote:

> Julian Phillips <julian@quantumfyre.co.uk> writes:
>
>> While trying out git on a large repository (10000s of commits, 1000s
>> of branches, ~2.5Gb when packed) at work I noticed that doing a pull
>> was taking a long time (longer than I was prepared to wait anyway).
>
> Are they all real branches?  In other words, does your project
> have 1000s of active parallel development?

(Oops, over enthusiastic with the 0 there, I mean 100s of branches, about 
880 atm).

They are mostly topic style branches, with only 20 or so in active use at 
any one time.  The idea of having to cope with 100s of active branches at 
the same time (given that we currently are using subversion) is quite 
frankly terrifying.

> Also, assuming the answer to the above question is yes, will you
> have 1000s of branches on your end and will work on any one of
> them?

It would be necessary to have access to all of the currently active 
branches at least, with the added complication that the set of current 
active branches changes quite rapidly.

> If you do not care all 1000s branches but only are interested in
> selected few, you could change that configuration to suit your
> needs better.

I think the problem here would be keeping track of which branches are 
currently active.  Some scheme could probably be derived, but I was hoping 
that fetching an unchanged branch would be sufficently fast that it would 
be necessary.  I appear to have been wrong :(

> I suspect most of the time is being spent in the
> append-fetch-head loop in fetch_main shell function in
> git-fetch.sh The true fix would not be to limit the number of
> branches updated, but to speed that part of the code up.

Indeed, each call to append_fetch_head is taking ~1.7s (~1.5s user, ~0.2s 
sys).  So simply looping over all the branches explains the ~27m that a 
complete fetch takes. (This is for fetch with no updates).  Given that a 
"clone orig new" takes ~8m30 (half of which would seem to be IO), it looks 
like it may be faster to create a new repository each time instead of 
updating the old one, which is certainly a viable workaround - but might 
imply that fetch has some room for improvement?

-- 
Julian

  ---
There is nothing stranger in a strange land than the stranger who comes
to visit.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Restraining git pull/fetch to the current branch
  2007-01-12 14:08   ` Julian Phillips
@ 2007-01-15 13:06     ` Julian Phillips
  0 siblings, 0 replies; 5+ messages in thread
From: Julian Phillips @ 2007-01-15 13:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Fri, 12 Jan 2007, Julian Phillips wrote:

>>  I suspect most of the time is being spent in the
>>  append-fetch-head loop in fetch_main shell function in
>>  git-fetch.sh The true fix would not be to limit the number of
>>  branches updated, but to speed that part of the code up.
>
> Indeed, each call to append_fetch_head is taking ~1.7s (~1.5s user, ~0.2s 
> sys).  So simply looping over all the branches explains the ~27m that a 
> complete fetch takes. (This is for fetch with no updates).  Given that a 
> "clone orig new" takes ~8m30 (half of which would seem to be IO), it looks 
> like it may be faster to create a new repository each time instead of 
> updating the old one, which is certainly a viable workaround - but might 
> imply that fetch has some room for improvement?

I have had a chance to spend a little more time looking at this.  It would 
appear that the major culprit is show-ref.

Running "git show-ref --hash <ref>" takes ~1.7s, compared to 0.002s for 
"cat $GIT_DIR/<ref>".  If I add the following to the top of 
append_fetch_head a null fetch takes 1m28s instead of ~27m.

local_head_=$(cat $GIT_DIR/$local_name_);
if [ "$head_" == "$local_head_" ]; then
 	return;
fi

Looking at the code for show-ref it appears to looks at all the refs to 
find the one you ask for.  This makes fetch O(n^2) in no of branches, 
which would seem not strictly necessary - but then I am not really 
familiar with the internal working of git.  I noticed that the man page 
for show-ref says that its use over direct access is encouraged, but as it 
stands it is far too slow to be used in fetch when you have a large 
many-branched repository...

(In looking at this I also discovered that if you have too many branches 
then fetch will die with a too long command line error when calling 
git-fetch-pack.)

-- 
Julian

  ---
I will never lie to you.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-01-15 17:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-11 21:47 Restraining git pull/fetch to the current branch Julian Phillips
2007-01-12  0:59 ` Shawn O. Pearce
2007-01-12  1:09 ` Junio C Hamano
2007-01-12 14:08   ` Julian Phillips
2007-01-15 13:06     ` Julian Phillips

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).