Git development
 help / color / mirror / Atom feed
* Re: [GUILT PATCH 2/5] guilt-guard: Assign guards to patches in series
From: Josef Sipek @ 2007-08-09 13:47 UTC (permalink / raw)
  To: Eric Lesh; +Cc: git
In-Reply-To: <87bqdhnotj.fsf@hubert.paunchy.net>

On Thu, Aug 09, 2007 at 12:34:48AM -0700, Eric Lesh wrote:
> [ I'm finally back to this.  Thanks for your comments. ]

Good. I was starting to get worried :)

> Josef Sipek <jsipek@fsl.cs.sunysb.edu> writes:
> 
> [...]
> 
> >> +}
> >> +
> >> +# usage: set_guards <patch> <guards...>
> >> +set_guards()
> >> +{
> >> +	p="$1"
> >
> > Again, be careful about namespace polution.
> >
> 
> Can I use "local", or is it a bashism?  If not, use parentheses around
> the function body?

Right, "local" is a bashism therefore you must use a subshell (paretheses).

> >> +	shift
> >> +	for x in "$@"; do
> >> +		if [ -z $(printf %s "$x" | grep -e "^[+-]") ]; then
> >
> > Out of curiosity, why printf and not echo?
> >
> 
> For guards named '-e' or other funky things echo doesn't like and can't
> process with echo --.

Good enough reason :)

...
> I'm trying to clean the rest and get it ready again. This whole series
> will definitely need to incubate for a while once there's a
> reasonable-looking version, to make sure nothing goes crazy.  Hopefully
> it ends up being useful somewhere!

I'd use it at times. For certain scenarios (2 series that are mostly
identical) using guards makes more sense than different branches.

Thanks,

Josef 'Jeff' Sipek.

-- 
Humans were created by water to transport it upward.

^ permalink raw reply

* Re: Git on MSys (or how to make it easy for Windows users to compile git)
From: Alex Riesen @ 2007-08-09 13:55 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Marius Storm-Olsen, Torgil Svensson, Dmitry Kakurin, git
In-Reply-To: <Pine.LNX.4.64.0708091013430.21857@racer.site>

On 8/9/07, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > Very impressive.
>
> I have some issues with it, though.
>
> - I really hate to leave anything older than W2K behind.  You might not
>   care, but I do.
>
> - I tested it, and it gave a constant flicker, at least in the status bar.
>   Does not seem to be that fleshed out.

That's typical for impressive windows hackery...
Switching the transparency completely (frame and background in tabs) off
seem to help with the flicker. At least it does not flicker much for w2k here.

^ permalink raw reply

* Re: 'pu' branch for StGIT
From: Karl Hasselström @ 2007-08-09 14:18 UTC (permalink / raw)
  To: Pavel Roskin; +Cc: git, Catalin Marinas, Yann Dirson
In-Reply-To: <1186665883.28228.31.camel@dv>

On 2007-08-09 09:24:43 -0400, Pavel Roskin wrote:

> On Thu, 2007-08-09 at 09:38 +0200, Karl Hasselström wrote:
>
> > I take it this all means you're actually using my branch? What's
> > your opinion on its usefulness?
>
> Well, I tried it, and then ran a script to update all local
> repositories. It converted everything to "version 3", so I'm sort of
> stuck with it. If the "version 3" code is not committed to the
> mainline StGIT, I'll have to convert my repositories back or even
> re-fetch them.

Thanks for the vote of confidence. :-)

You should be able to do something like

  $ stg applied > .git/patches/branch/applied
  $ stg unapplied > .git/patches/branch/unapplied

and then manually change the version from 3 to 2, and be ready to go.
I haven't tested this, though!

> I have noticed two problems so far, but I cannot tell is they are
> specific to the "pu" branch.
>
> 1) Undead patches.

I saw the same problem today. I haven't had time to look into it, but
I believe it's due to stgit trying to directly modify files under
.git/refs instead of using git-update-ref, which breaks with packed
refs. The DAG patches rely much more on the refs, so the bug is more
severe in that case.

https://gna.org/bugs/?9710

> There is also a file .git/patches/wireless-dev/patchorder, which
> contains "at76_usb".

The patchorder file should be harmless. It's only used to determine
patch order for those cases where the DAG information isn't
sufficient. (That is, for unapplied patches.) It's strictly advisory,
and _not_ used to determine which patches exist.

> I was updating the repository by "stg pull", there were two patches,
> "at76_usb" being first. It couldn't be merged, so I deleted it. I
> deleted the other patch as well, since I new it was applied
> upstream. After another "stg pull" at76_usb became "undead".

Until this is fixed, you can use git-show-ref and git-update-ref to
manually delete the offending ref. That fixed the problem for me.

> 2) Invisible branches.

I haven't seen this problem at all -- in my repositories, "stg branch
-l" just works. Will try to reproduce (hopefully tonight). Do you have
a recepie on how to reproduce this from scratch?

-- 
Karl Hasselström, kha@treskal.com
      www.treskal.com/kalle

^ permalink raw reply

* Re: 'pu' branch for StGIT
From: Karl Hasselström @ 2007-08-09 14:24 UTC (permalink / raw)
  To: Pavel Roskin; +Cc: git, Catalin Marinas, Yann Dirson
In-Reply-To: <20070809141848.GA6342@diana.vm.bytemark.co.uk>

On 2007-08-09 16:18:48 +0200, Karl Hasselström wrote:

> I saw the same problem today. I haven't had time to look into it,
> but I believe it's due to stgit trying to directly modify files
> under .git/refs instead of using git-update-ref, which breaks with
> packed refs. The DAG patches rely much more on the refs, so the bug
> is more severe in that case.

git-gc started packing refs by default in late May. That's probably
what's caused it to surface.

-- 
Karl Hasselström, kha@treskal.com
      www.treskal.com/kalle

^ permalink raw reply

* Bug in git-svn: dcommit commits in the wrong branch after a rebase
From: Benoit SIGOURE @ 2007-08-08 21:35 UTC (permalink / raw)
  To: git; +Cc: Eric Wong

[-- Attachment #1: Type: text/plain, Size: 2009 bytes --]

Hi,
I was working with Git on a SVN branch `a' (say my Git branch is  
`mya') and I wanted to create a new SVN branch `b' and dcommit my  
changes there for others (poor SVN users) to see.  So I did a svn cp  
url://a url://b to create the branch `b' in SVN, git-svn fetch to  
import this branch, git checkout -b myb b and then rebased mya and  
then did a dcommit.  Although the last commit at this point (in  
branch myb) had a svn-id "pointing to" the SVN branch b, dcommit sent  
the commits to the branch `a'.

Test case:

svnadmin create repos
svn co file://`pwd`/repos wc
cd wc
svn mkdir branches
svn mkdir branches/a
echo foo >branches/a/foo
svn add branches/a/foo
svn ci -m 'branch a'
cd ..
git-svn clone --branches=branches file://`pwd`/repos wc.git
cd wc.git
echo git is cool >>foo
git-commit -a -m 'commit in git in branch a'
cd ../wc
svn cp branches/a branches/b
svn ci -m 'branch b'
cd ../wc.git
git-svn fetch
git-checkout -b myb b
git-rebase master
git-svn dcommit # sends the commit to SVN branch `a' instead of SVN  
branch `b'!

Temporary workaround (in case someone finds this post after stumbling  
on this problem):
svn mv branches/a branches/tmp
<commit>
svn mv branches/b branches/a
<commit>
svn mv branches/tmp branches/b
<commit>

After this, git-svn fetch will slightly complain but it will work  
nevertheless.

Found possible branch point: url://repo/branches/a => url://repo/ 
branches/tmp, <N>
Found branch parent: (b) <sha1>
Following parent with do_switch
Successfully followed parent
r<N> = <sha1> (b)
Found possible branch point: url://repo/branches/b => url://repo/ 
branches/a, <N+1>
Found branch parent: (a) <sha1-X>
Index mismatch: <sha1> != <sha1>
rereading <sha1-X>
Following parent with do_switch
Successfully followed parent
r<N+1> = <sha1> (a)
[...]

Despite the `Index mismatch' sort of warning, the Git repo seems to  
be correct.

Cheers,

PS: I use git version 1.5.3.rc3.25.gf9208-dirty

-- 
Benoit Sigoure aka Tsuna
EPITA Research and Development Laboratory



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 186 bytes --]

^ permalink raw reply

* Re: Bug in git-svn: dcommit commits in the wrong branch after a rebase
From: Benoit SIGOURE @ 2007-08-09 15:45 UTC (permalink / raw)
  To: git; +Cc: Eric Wong
In-Reply-To: <21FC6D7F-5459-406D-AA06-D16E525B3C17@lrde.epita.fr>

[-- Attachment #1: Type: text/plain, Size: 914 bytes --]

On Aug 8, 2007, at 11:35 PM, Benoit SIGOURE wrote:

> Test case:
>
> svnadmin create repos
> svn co file://`pwd`/repos wc
> cd wc
> svn mkdir branches
> svn mkdir branches/a
> echo foo >branches/a/foo
> svn add branches/a/foo
> svn ci -m 'branch a'
> cd ..
> git-svn clone --branches=branches file://`pwd`/repos wc.git
> cd wc.git
> echo git is cool >>foo
> git-commit -a -m 'commit in git in branch a'
> cd ../wc
> svn cp branches/a branches/b
> svn ci -m 'branch b'
> cd ../wc.git
> git-svn fetch


> git-checkout -b myb b
> git-rebase master
> git-svn dcommit # sends the commit to SVN branch `a' instead of SVN  
> branch `b'!
>

Actually the test case is wrong, it should end in:
git-rebase b # while being on `master' branch
git-svn dcommit # this works as expected

Thanks to siprbaum for spotting this on IRC.  Sorry for the noise.

-- 
Benoit Sigoure aka Tsuna
EPITA Research and Development Laboratory



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 186 bytes --]

^ permalink raw reply

* Re: Bug in git-svn: dcommit commits in the wrong branch after a rebase
From: Steven Grimm @ 2007-08-09 15:46 UTC (permalink / raw)
  To: Benoit SIGOURE; +Cc: git, Eric Wong
In-Reply-To: <21FC6D7F-5459-406D-AA06-D16E525B3C17@lrde.epita.fr>

Benoit SIGOURE wrote:
> git-svn fetch
> git-checkout -b myb b
> git-rebase master
> git-svn dcommit # sends the commit to SVN branch `a' instead of SVN 
> branch `b'!

That's exactly what I would expect to happen. The "git-rebase" is the 
key here; it is effectively telling git to switch back to your master 
branch. Try running "git log" before and after the rebase command and 
you should get a slightly better idea of what's happening. Rebase is 
kind of a tricky beast; a basic rule of thumb is that you should only 
use it to go forward in time on a single upstream branch, not to hop 
between upstream branches. Its behavior in non-forward-in-time cases is 
predictable once you know how it works, but not necessarily intuitive.

What are you expecting rebase to do here? We can probably suggest some 
other commands that will do what you're hoping to do. My hunch is that 
you're trying to use it to effectively do a merge of your "a" and "b" 
branches, but maybe I'm wrong about that.

-Steve

^ permalink raw reply

* git and larger trees, not so fast?
From: moe @ 2007-08-09 16:30 UTC (permalink / raw)
  To: git

hi guys,

earlier today i imported one of my larger trees
(~70k files) into git and was quite disappointed
by the performance.

i made some tests on latest master branch
(1.5.3.rc4.29.g74276) and it seems like git
hits a wall somewhere above ~50k files.

i'm seeing 'commit' timings of 30s and
up as well as 'status' timings in the 10s
ballpark.

here's a test-case (should be safe to
copy/paste on linux, bash):

#
# first create a tree of roughly 100k files
#
mkdir bummer
cd bummer
for ((i=0;i<100;i++)); do
mkdir $i && pushd $i;
for ((j=0;j<1000;j++)); do
echo "$j" >$j; done; popd;
done

#
# init and add this to git
#
time git init
git config user.email "no@thx"
git config user.name "nothx"
time git add .
time git commit -m 'buurrrrn' -a

#
# git-status, tunes in at around ~10s for me
#
time git-status
time git-status
time git-status

#
# git-commit, takes a whopping 52s for me
#
date >50/500
time git commit -m 'expose the turtle' 50/500


regards,
moe

^ permalink raw reply

* Re: 'pu' branch for StGIT
From: Pavel Roskin @ 2007-08-09 16:33 UTC (permalink / raw)
  To: Karl Hasselström; +Cc: git, Catalin Marinas, Yann Dirson
In-Reply-To: <20070809141848.GA6342@diana.vm.bytemark.co.uk>

On Thu, 2007-08-09 at 16:18 +0200, Karl Hasselström wrote:

> You should be able to do something like
> 
>   $ stg applied > .git/patches/branch/applied
>   $ stg unapplied > .git/patches/branch/unapplied
> 
> and then manually change the version from 3 to 2, and be ready to go.
> I haven't tested this, though!

That seems to work.  Thank you!  "branch" should be substituted with the
current branch, of course.

> > I have noticed two problems so far, but I cannot tell is they are
> > specific to the "pu" branch.
> >
> > 1) Undead patches.
> 
> I saw the same problem today. I haven't had time to look into it, but
> I believe it's due to stgit trying to directly modify files under
> .git/refs instead of using git-update-ref, which breaks with packed
> refs. The DAG patches rely much more on the refs, so the bug is more
> severe in that case.
> 
> https://gna.org/bugs/?9710

I've attached the test case to that bug.  You are right, git-gc is involved.

> > 2) Invisible branches.
> 
> I haven't seen this problem at all -- in my repositories, "stg branch
> -l" just works. Will try to reproduce (hopefully tonight). Do you have
> a recepie on how to reproduce this from scratch?

It's a problem with git-gc too!  Just clone some repository and run "stg
branch -l" in it.  It with show master.  Run git-gc, and "stg branch -l"
will show "No branches".

I see that in my Linux repository there are files
in .git/refs/remotes/wireless-dev but not in other directories
under .git/refs/remotes/

-- 
Regards,
Pavel Roskin

^ permalink raw reply

* git and larger trees, not so fast?
From: moe @ 2007-08-09 16:06 UTC (permalink / raw)
  To: git

hi guys,

earlier today i imported one of my larger trees
(~70k files) into git and was quite disappointed
by the performance.

i made some tests on latest master branch
(1.5.3.rc4.29.g74276) and it seems like git
hits a wall somewhere above ~50k files.

i'm seeing 'commit' timings of 30s and
up as well as 'status' timings in the 10s
ballpark.

here's a test-case (should be safe to
copy/paste on linux, bash):

#
# first create a tree of roughly 100k files
#
mkdir bummer
cd bummer
for ((i=0;i<100;i++)); do
mkdir $i && pushd $i;
for ((j=0;j<1000;j++)); do
echo "$j" >$j; done; popd;
done

#
# init and add this to git
#
time git init
git config user.email "no@thx"
git config user.name "nothx"
time git add .
time git commit -m 'buurrrrn' -a

#
# git-status, tunes in at around ~10s for me
#
time git-status
time git-status
time git-status

#
# git-commit, takes a whopping 52s for me
#
date >50/500
time git commit -m 'expose the turtle' 50/500


regards,
moe

^ permalink raw reply

* Re: git and larger trees, not so fast?
From: Linus Torvalds @ 2007-08-09 17:11 UTC (permalink / raw)
  To: moe; +Cc: git
In-Reply-To: <20070809163026.GD568@mbox.bz>



On Thu, 9 Aug 2007, moe wrote:
> 
> i made some tests on latest master branch
> (1.5.3.rc4.29.g74276) and it seems like git
> hits a wall somewhere above ~50k files.

Good catch. Definitely not acceptable performance.

We seem to spend a lot of our time in memcpy:

	samples  %        image name               app name                 symbol name
	200527   25.4551  libc-2.6.so              libc-2.6.so              _wordcopy_bwd_aligned
	104505   13.2660  libc-2.6.so              libc-2.6.so              _wordcopy_fwd_aligned
	99185    12.5907  libz.so.1.2.3            libz.so.1.2.3            (no symbols)
	83452    10.5935  libc-2.5.so              libc-2.5.so              (no symbols)
	54203     6.8806  git                      git                      assign_blame
	46153     5.8587  git                      git                      read_directory_recursive
	27665     3.5118  git                      git                      handle_split
	21385     2.7146  vmlinux                  vmlinux                  blk_complete_sgv4_hdr_rq
	20745     2.6334  git                      git                      read_packed_refs
	12709     1.6133  git                      git                      builtin_diffstat
	7829      0.9938  git                      git                      show_patch_diff
	...

but the silly thing is, this is only true if you give the filenames 
explicitly!

Lookie here:

	[torvalds@woody bummer]$ date >50/500
	[torvalds@woody bummer]$ time git commit -a -m 'expose the turtle'
	Created commit 25ca22d: expose the turtle
	 1 files changed, 1 insertions(+), 1 deletions(-)
	
	real    0m4.612s
	user    0m4.224s
	sys     0m0.412s

	[torvalds@woody bummer]$ date >50/500
	[torvalds@woody bummer]$ time git commit -m 'expose the turtle' 50/500
	Created commit 009f6b5: expose the turtle
	 1 files changed, 1 insertions(+), 1 deletions(-)
	
	real    0m12.464s
	user    0m12.129s
	sys     0m0.336s

ie we take almost three times longer with explicitly naming the file, than 
when just using "git commit -a". Oops.

That said, even the 4.6 seconds is really not acceptable: this is on a 
good 2.6GHz Core 2 Duo too, so on weaker hardware it would be quite 
painful.

I haven't looked at *why* it's that slow, but it's not anything really 
fundamental, the basic operations are fast:

	[torvalds@woody bummer]$ time git add 50/500

	real    0m0.064s
	user    0m0.048s
	sys     0m0.016s

	[torvalds@woody bummer]$ time git write-tree
	7480230419e510c93082a4a19e23d928a426973a
	
	real    0m0.069s
	user    0m0.048s
	sys     0m0.024s

	[torvalds@woody bummer]$ time git diff
	
	real    0m0.127s
	user    0m0.000s
	sys     0m0.000s

so it's not the "lstat()" that we do on all files, or the write-tree 
(which are all O(n) in files, with a rather small constant), but some 
O(n**2) behaviour elsewhere.

And all the expense seems to be in not the commit itself, but in

	[torvalds@woody bummer]$ time git 'runstatus' '--nocolor'

	real    0m4.208s
	user    0m4.068s
	sys     0m0.140s

and that thing seems to suck really really hard.

Doing an ltrace on it shows tons and tons of:

	...
	strlen("35")
	strlen("349")
	calloc(1, 72)
	memcpy(0x73034e, "10/", 3)
	memcpy(0x730351, "349", 4)
	memmove(0x2ab637f41e80, 0x2ab637f41e78, 781768)
	...

but I haven't looked at where they come from yet.

		Linus

^ permalink raw reply

* Re: git and larger trees, not so fast?
From: Linus Torvalds @ 2007-08-09 17:38 UTC (permalink / raw)
  To: moe; +Cc: git
In-Reply-To: <alpine.LFD.0.999.0708090948250.25146@woody.linux-foundation.org>



On Thu, 9 Aug 2007, Linus Torvalds wrote:
> 
> Doing an ltrace on it shows tons and tons of:
> 
> 	...
> 	strlen("35")
> 	strlen("349")
> 	calloc(1, 72)
> 	memcpy(0x73034e, "10/", 3)
> 	memcpy(0x730351, "349", 4)
> 	memmove(0x2ab637f41e80, 0x2ab637f41e78, 781768)
> 	...
> 
> but I haven't looked at where they come from yet.

Ouch. It's the diffing between HEAD and the index, and it's all from 
"add_index_entry()", which sorts the index array using an insertion sort. 
So when the index array gets large, that sort spends all its time in huge 
memmove() calls.

The silly thing, of course, is that we don't even "need" to do that: both 
the index and the trees are really sorted already, so we could just 
interleave them. But since we read them separately, the thing just sucks.

We've fixed other similar cases of this we had (diffing trees against each 
other) by walking the trees together, but the "index vs tree" diff (and 
merge) is the one remaining place where we still use the original stupid 
algorithm. So you'll see this performance problem for 

 - diff tree against index ("git diff HEAD"
 - merge tree into index ("git read-tree -m HEAD")

which both do the stupid index/tree filling.

So this is all O(n**2), which is why we haven't reacted very much - it 
doesn't show up nearly as much with the kernel. Also, with a smaller set 
of files, it would tends to fit in the L2 cache of most competent CPU's. 
So not only is it n**2, you get the cache trashing behaviour too, and 
that, I think, is what really causes it to fall off the cliff edge!

Gaah. This shouldn't be *that* hard to fix, but I'm not entirely sure I'll 
have time today.

Diffing the index against the tree *should* be instantaneous. It should be 
no more costly than reading the tree itself (which is 0.191 seconds for 
me: test "git read-tree -m HEAD" vs "git read-tree HEAD") and reading the 
index (which is almost instantaneous - the only way I can test it is by 
doing something like "git update-index --refresh", and that's 0.131 
seconds, but that includes all the 100,000 "lstat()" calls).

So basically, we're spending several seconds just doing stupid 
make-believe work and moving the index array around. Ouch.

Anyway, the good news is that this is by no means fundamental. It's a 
small and stupid detail. The only thing that makes it at all painful is 
that this is in some low-level crud that we haven't touched in *ages*, so 
I've long since swapped out all my recollection of how we do it.

(We basically do:

	read_cache();

followed by

	unpack_trees();

and each of those *on*its*own* is pretty cheap, but when we unpack trees 
into an already populated index, the end result is ugly.

			Linus

^ permalink raw reply

* Re: git and larger trees, not so fast?
From: Linus Torvalds @ 2007-08-09 17:54 UTC (permalink / raw)
  To: moe; +Cc: git
In-Reply-To: <alpine.LFD.0.999.0708090948250.25146@woody.linux-foundation.org>



On Thu, 9 Aug 2007, Linus Torvalds wrote:
> 
> We seem to spend a lot of our time in memcpy:
> 
> 	samples  %        image name               app name                 symbol name
> 	200527   25.4551  libc-2.6.so              libc-2.6.so              _wordcopy_bwd_aligned
> 	104505   13.2660  libc-2.6.so              libc-2.6.so              _wordcopy_fwd_aligned

Sorry, that was a bogus trace with some old stuff in it.

The real profile was this one.

	102343   73.1377  libc-2.6.so              libc-2.6.so              _wordcopy_bwd_aligned
	3573      2.5534  git                      git                      cache_name_compare
	2328      1.6637  git                      git                      index_name_pos
	...

which matches the rest of my emails.. (the "73%" is actually really 
supposed to be about 95%, but I had X running and doing stuff at the same 
time, so it was only 73% of all the other CPU activity that was going on 
over the time I profiled).

		Linus

^ permalink raw reply

* Re: git and larger trees, not so fast?
From: Junio C Hamano @ 2007-08-09 18:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: moe, git
In-Reply-To: <alpine.LFD.0.999.0708091015500.25146@woody.linux-foundation.org>

Linus Torvalds <torvalds@linux-foundation.org> writes:

> So this is all O(n**2), which is why we haven't reacted very much - it 
> doesn't show up nearly as much with the kernel. Also, with a smaller set 
> of files, it would tends to fit in the L2 cache of most competent CPU's. 
> So not only is it n**2, you get the cache trashing behaviour too, and 
> that, I think, is what really causes it to fall off the cliff edge!
>
> Gaah. This shouldn't be *that* hard to fix, but I'm not entirely sure I'll 
> have time today.

One thing to keep in mind is that in your earlier test of "git
write-tree" (or "git commit") followed by "git add a/file"
followed by "git write-tree" is extremely fast because the
last operation optimizes otherwise O(n) behaviour of write-tree
from index extreamely cheap, thanks to cache-tree in the index.

> Diffing the index against the tree *should* be instantaneous.

Right now we do not cull the subdirectory that we _know_ are
unchanged in "git diff-index --cached" using cache-tree, but
diffing the index against the tree could be instantaneous.

^ permalink raw reply

* Re: git and larger trees, not so fast?
From: Linus Torvalds @ 2007-08-09 18:06 UTC (permalink / raw)
  To: moe; +Cc: git
In-Reply-To: <alpine.LFD.0.999.0708091015500.25146@woody.linux-foundation.org>



On Thu, 9 Aug 2007, Linus Torvalds wrote:
> 
> Gaah. This shouldn't be *that* hard to fix, but I'm not entirely sure I'll 
> have time today.

In fact, I'm almost sure I will *not* have time today.

Anyway, the really trivial (and ugly) fix is to handle the cases of adding 
_independent_ stages to the index (which is the case for both "git 
diff-index" and "git read-tree -m") differently: instead of using the 
standard "add_index_entry()", which does all the complex sorting and 
checks that there aren't duplicates, we could do a much simpler one that 
just unconditionally appends to the end of the index.

This works, because when the stages are independent, there can be no index 
clashes (by definition).

Then, after adding all the stages, we could just do a "qsort()" on the 
result, and rather than having an expensive O(n**2) thing, we'd have a 
much nicer and well-behaved (with a smaller constant too) O(n*logn) thing.

I bet it's just ~50 lines of code, it really shouldn't be that hard to do. 
I just won't be able to do it and test it until late tonight or tomorrow, 
I suspect.

Sadly, this is an area that is almost exclusively mine and Junio's. I'd 
love for somebody else to get their feet wet, but doing a

	gitk read-cache.c

shows that few enough people have done anythign really fundamental in this 
file..

			Linus

^ permalink raw reply

* Re: git and larger trees, not so fast?
From: David Kastrup @ 2007-08-09 18:06 UTC (permalink / raw)
  To: git
In-Reply-To: <alpine.LFD.0.999.0708090948250.25146@woody.linux-foundation.org>

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Thu, 9 Aug 2007, moe wrote:
>> 
>> i made some tests on latest master branch
>> (1.5.3.rc4.29.g74276) and it seems like git
>> hits a wall somewhere above ~50k files.
>
> Good catch. Definitely not acceptable performance.
>
> We seem to spend a lot of our time in memcpy:

[...]

> Doing an ltrace on it shows tons and tons of:
>
> 	...
> 	strlen("35")
> 	strlen("349")
> 	calloc(1, 72)
> 	memcpy(0x73034e, "10/", 3)
> 	memcpy(0x730351, "349", 4)
> 	memmove(0x2ab637f41e80, 0x2ab637f41e78, 781768)
> 	...
>
> but I haven't looked at where they come from yet.

Ok, preaching to the pope here, but: moving memory is bad.  Make sure
data can stay where it starts.  In particular: don't use realloc.  And
if you do, grow the size _exponentially_ (like a factor of 1.5).  If
you grow the size exponentially, at least the movery hits the
algorithm with O(n lg n).  If data stays put, even in badly scattered
linear lists, we have O(n).  If you grow realloc _linearly_ (constant
size increment), then the algorithm is hit with O(n^2).

Technically, basically _all_ operations of git can be done by doing
list merges of presorted lists.  The index can be kept sorted.  All
requests into the index can be collected in readdir order into one
linear list, this gets sorted once using a merge sort with O(n lg n),
then it gets merged with the index O(n+N).  As long as the whole index
is read in, it can't be done faster.  It is not necessary to organize
the read data into trees or more complicate structures: a single
linear list is sufficient.  One can use the hierarchical structure of
a directory to shave off some part of the sorting cost, though, and it
considerably will lessen memory impact (and copying costs) if a
file/directory/tree entry can contain a relative file name and a
"pointer to prefix" where the rest of the file path is to be found.

Anyway, so much for some theory.  Now let's look at bad points in the
code, judging from your benchmarks.

A grep for realloc is appaling.  Let's see what is actually involved
here.

attr.c:

struct git_attr *git_attr(const char *name, int len)
{

	a->attr_nr = attr_nr++;
	git_attr_hash[pos] = a;

	check_all_attr = xrealloc(check_all_attr,
				  sizeof(*check_all_attr) * attr_nr);

[...]

Full O(n^2) behavior of the worst kind (increment 1!).

builtin-commit-tree.c:

static void add_buffer(char **bufp, unsigned int *sizep, const char *fmt, ...)
{
	size = *sizep;
	newsize = size + len + 1;
	alloc = (size + 32767) & ~32767;

[size rounded to next 32k (inconsistent! needs to be size+1 rounded up)]

	buf = *bufp;
	if (newsize > alloc) {
		alloc = (newsize + 32767) & ~32767;

[newsize rounded to next 32k]

		buf = xrealloc(buf, alloc);

[O(n^2): constant increment.  Important?  No idea.]

		*bufp = buf;
	}
	*sizep = newsize - 1;

	memcpy(buf + size, one_line, len);
}

[...]

int register_commit_graft(struct commit_graft *graft, int ignore_dups)
{

[...]
	if (commit_graft_alloc <= ++commit_graft_nr) {
		commit_graft_alloc = alloc_nr(commit_graft_alloc);
		commit_graft = xrealloc(commit_graft,
					sizeof(*commit_graft) *
					commit_graft_alloc);
	}
	if (pos < commit_graft_nr)
		memmove(commit_graft + pos + 1,
			commit_graft + pos,
			(commit_graft_nr - pos - 1) *
			sizeof(*commit_graft));
	commit_graft[pos] = graft;
	return 0;
}

Eeek.  Start with a linear list, not an array.

objects.c:


void add_object_array_with_mode(struct object *obj, const char *name, struct object_array *array, unsigned mode)
{
	unsigned nr = array->nr;
	unsigned alloc = array->alloc;
	struct object_array_entry *objects = array->objects;

	if (nr >= alloc) {
		alloc = (alloc + 32) * 2;
		objects = xrealloc(objects, alloc * sizeof(*objects));
		array->alloc = alloc;

Constant increment, O(n^2).

pathlist.c:

static int add_entry(struct path_list *list, const char *path)
{
	int exact_match;
	int index = get_entry_index(list, path, &exact_match);

	if (exact_match)
		return -1 - index;

	if (list->nr + 1 >= list->alloc) {
		list->alloc += 32;
		list->items = xrealloc(list->items, list->alloc
				* sizeof(struct path_list_item));
	}

Constant increment, O(n^2).

That's just a cursory examination.  In my opinion, pretty much every
realloc should be replaced by some sort of list structure.  That would
be the nicest thing.  I have sped up some awk scripts that built up
argument lists by a factor of 100 by replacing
a[index] = a[index] " " thenewstuff
with
a[index,nr[index]++] = thenewstuff
and then never concatenating the strings, but just outputting them in
a loop.


Anyway, short of that, don't realloc by fixed increments, always use
alloc_nr as soon as multiple reallocs are to be expected.

And they certainly are in some of the above cited code passages.

-- 
David Kastrup

^ permalink raw reply

* Re: git and larger trees, not so fast?
From: Junio C Hamano @ 2007-08-09 18:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: moe, git
In-Reply-To: <alpine.LFD.0.999.0708091056180.25146@woody.linux-foundation.org>

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Thu, 9 Aug 2007, Linus Torvalds wrote:
>> 
>> Gaah. This shouldn't be *that* hard to fix, but I'm not entirely sure I'll 
>> have time today.
>
> In fact, I'm almost sure I will *not* have time today.
>
> Anyway, the really trivial (and ugly) fix is to handle the cases of adding 
> _independent_ stages to the index (which is the case for both "git 
> diff-index" and "git read-tree -m") differently...
> ...
> Sadly, this is an area that is almost exclusively mine and Junio's. I'd 
> love for somebody else to get their feet wet,...

I hopefully have some time this evening to look into this, if
not earlier.

^ permalink raw reply

* Re: [PATCH] Documentation/git-svn: Instructions for cloning a git-svn-created repository
From: Eric Wong @ 2007-08-09 19:37 UTC (permalink / raw)
  To: Adam Roben; +Cc: git, Junio C Hamano, Shawn O. Pearce
In-Reply-To: <1186388203181-git-send-email-aroben@apple.com>

Adam Roben <aroben@apple.com> wrote:
> These instructions tell you how to create a clone of a repository created with
> git-svn, that can in turn be used with git-svn.
> 
> Signed-off-by: Adam Roben <aroben@apple.com>
> ---
> > gitster: (3) you prepare one git-svn managed git repository, allow others to
> > clone it via git, and have each of these cloned git repositories to interact
> > with svn via git-svn -- this third mode of operation is not supported.
> > 
> > spearce: be nice if someone who cared about git-svn supporting (3) either wrote
> > a patch for the documentation, or taught the tool how to do this more
> > automatically.
> 
> Here's that patch. Maybe I'll get around to Shawn's second (far more ideal)
> suggestion sometime.
> 
>  Documentation/git-svn.txt |   19 +++++++++++++++++++
>  1 files changed, 19 insertions(+), 0 deletions(-)
> 
> diff --git a/Documentation/git-svn.txt b/Documentation/git-svn.txt
> index 0a210e4..3e3b597 100644
> --- a/Documentation/git-svn.txt
> +++ b/Documentation/git-svn.txt
> @@ -435,6 +435,25 @@ Tracking and contributing to an entire Subversion-managed project
>  # of dcommit/rebase/show-ignore should be the same as above.
>  ------------------------------------------------------------------------
>  
> +The initial 'git-svn clone' Subversion can be quite time-consuming (especially
> +for large repositories). If multiple people (or one person with multiple
> +machines) want to use git-svn to interact with the same Subversion repository,
> +you can do the initial 'git-svn clone' to a repository on a server and have
> +each person clone that repository with 'git clone':
> +
> +------------------------------------------------------------------------
> +# Do the initial import on a server
> +	ssh server "cd /pub && git-svn clone http://svn.foo.org/project
> +# Clone locally
> +	git clone server:/pub/project
> +# Tell git-svn which branch contains the Subversion commits
> +	git update-ref refs/remotes/git-svn origin/master
> +# Initialize git-svn locally (be sure to use the same URL and -T/-b/-t options as were used on server)
> +	git-svn init http://svn.foo.org/project
> +# Pull the latest changes from Subversion
> +	git-svn rebase
> +------------------------------------------------------------------------
> +
>  REBASE VS. PULL/MERGE
>  ---------------------

This method won't get branches and tags under the refs/remotes/
namespace, will it?

I personally believe using rsync to clone repositories created with
git-svn is the simplest and best method for now.

-- 
Eric Wong

^ permalink raw reply

* Re: Git'ing a non-labeled set of sources
From: Jan Hudec @ 2007-08-09 20:02 UTC (permalink / raw)
  To: Sparks, Sam; +Cc: git
In-Reply-To: <CF7E46FCFF66AD478BB72724345289EC170CE4@twx-exch01.twacs.local>

[-- Attachment #1: Type: text/plain, Size: 2065 bytes --]

On Wed, Aug 08, 2007 at 13:59:38 -0500, Sparks, Sam wrote:
> Hello All,
> 
> Please excuse me if this is an ignorant question; I'm new to git and my
> have overlooked something in the documentation.
> 
> I'm attempting to obtain a snapshot of source code from an unlabeled git
> branch in a public repository. I've found in the documentation that a
> timestamp cannot be used to specify a particular version of source code,
> but I believe I can work with the commit value as returned by 'git
> show'.

Clone and pull, over git protocol, only ask the server for objects referenced
by any refs -- plus all objects those depend on. Which does not necessarily
mean all commits, because references may be removed.

If a branch (ie. ref in refs/heads) is removed or rewound (moved to point to
commit that is not descendant of what it pointed to before), some commits may
become dangling. Clone and pull won't know such commits exist and therefore
won't be able to ask server to provide them.

> However, I have been unsuccessful in my attempts to use this identifier
> to clone or checkout the associated source tree. Has anyone been
> successful in using git to successfully replicate an unlabeled version
> of sources in a repository?
> 
> Here is my latest attempt:
> /dir_i_want_to_replicate $ git show --pretty=short
> commit 5b1313fb2758ffce8b624457f777d8cc6709608d
> Author: ....

I bet that if you do 'git lost-found' here, it will find something and this
commit will be among the finds, or predecessor of one of them.

> /replication_dir $ git clone git://www.denx.de/git/u-boot.git
> u-boot-mpc83xx
> Blah blah blah..
>  100% (4378/4378) done
> /replication_dir/u-boot-mpc83xx/ $ git checkout
> 5b1313fb2758ffce8b624457f777d8cc6709608d
> error: pathspec '5b1313fb2758ffce8b624457f777d8cc6709608d' did not match
> any. 

It seems that git pull only accepts refs as arguments. So you'll have to
create a branch at that commit in the origin to get it over to destination.

-- 
						 Jan 'Bulb' Hudec <bulb@ucw.cz>

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: Workflow question: A case for git-rebase?
From: Jan Hudec @ 2007-08-09 20:30 UTC (permalink / raw)
  To: Thomas Adam; +Cc: Johannes Schindelin, git
In-Reply-To: <18071eea0708081456l2ff1b73dy90ef33c1b5058c77@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3834 bytes --]

On Wed, Aug 08, 2007 at 22:56:33 +0100, Thomas Adam wrote:
> On 08/08/07, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > Hi,
> >
> > On Wed, 8 Aug 2007, Thomas Adam wrote:
> >
> > > As for myself, I maintain _locally_ a few branches (branchX, branchY)
> > > which dictate some bits and pieces I'm working on.  Periodically, I
> > > will tend to merge either merge to master and then push those changes
> > > out.  So far so good...
> > >
> > > But, I've now come up against a case whereby if one of my colleagues
> > > changes a file (call it fileA) in branch master, and, in the course of
> > > my working in branchX means i modify fileA also, when I come to merge
> > > branchX into master I find the original change in master (as submitted
> > > by my colleague) being reverted by my changes in branchX.
> >
> > I have a hard time seeing that.  If you touch the same code,
> > unidentically, merge-recursive will not be nice to you: it will show
> > conflicts, and you have to resolve them.
> >
> > Or do you use "-s ours"?
> 
> No, nothing like that.  I have had a situation where by a merge from
> branchX to master has resulted in master's changes to fileA being
> reverted based on what was in the contents of fileA in branchX -- this
> is of course wrong though -- master hsa the most recent copy.  My
> solution therefore was to cherry pick the commit into branchX and
> remerge into master.  This is why I was forced to ask about whether or
> not git-rebase was the correct way to go.

Git rebase is a correct way to go, with advantage of resulting in simpler
history and disadvantage of slightly harder conflict resolution (since you
merge commit-at-a-time rather than in one big block). However merge is
equally correct way to go.

Either there is a bug in merge -- which I would consider rather unlikely,
though not impossible -- or you actually did, probably unintentionally, undo
the master's changes. This might happen if:

 - You try to merge, either in --no-commit mode, or have a conflict, so it's
   not commited.
 - Then decide you don't want to resolve now and undo the
   changes by checking out the files, but don't clean the information about
   merge in progress.
 - Commit some changes in such state. This records a merge, that revers all
   changes from master.

Similarly attempt to merge just part of files would result in a problem like
you describe -- merging is only supported on whole trees.

> Although I suppose this leads me to the ancillory question of:  At the
> point I merged master into branchX did this cause any problems for any
> future merges of branchX into master?   I cannot recall if this
> "revert scenario" I describe to master happened pre or past my merge
> of master into branchX, but I suspect it was after I had merged master
> into branchX.

Merge is completely symetrical operation in git. Merging branchX into master
and merging master into branchX is the same for all purposes whatsoever
(though you can tell how you did it by order of the parsent in the commit
objecT).

Repeated merges between two branches are allowed and always correct thing to do.
However, you should be aware, that attempt to reject a change will be
recorded as a reversal and merged as such. You can try visualizing the
situation before the merge with:
  gitk master-before-merge...branchX-before-merge
  (or equivalently: gitk merge-result^...merge-result^2)
print the base revision with:
  git merge-base master-before-merge...branchX-before-merge
  (or equivalently: git merge-base merge-result^...merge-result^2)
look it up the graph and contemplate at what could have caused the reversal.
I expect you can't disclose the code to ask anybody help you with that.

-- 
						 Jan 'Bulb' Hudec <bulb@ucw.cz>

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: 'pu' branch for StGIT
From: Karl Hasselström @ 2007-08-09 20:39 UTC (permalink / raw)
  To: Pavel Roskin; +Cc: git, Catalin Marinas, Yann Dirson
In-Reply-To: <1186677210.31394.28.camel@dv>

On 2007-08-09 12:33:30 -0400, Pavel Roskin wrote:

> On Thu, 2007-08-09 at 16:18 +0200, Karl Hasselström wrote:
>
> > You should be able to do something like
> >
> >   $ stg applied > .git/patches/branch/applied
> >   $ stg unapplied > .git/patches/branch/unapplied
> >
> > and then manually change the version from 3 to 2, and be ready to
> > go. I haven't tested this, though!
>
> That seems to work. Thank you!

What? You mean I didn't forget any step? ;-)

> "branch" should be substituted with the current branch, of course.

I was just too lazy to type the brackets. Or maybe I'm just evil and
_want_ the lock-in effect. Who knows? :-)

> > I saw the same problem today.
> >
> > https://gna.org/bugs/?9710
>
> I've attached the test case to that bug. You are right, git-gc is
> involved.

Thanks.

> > I haven't seen this problem at all -- in my repositories, "stg
> > branch -l" just works. Will try to reproduce (hopefully tonight).
> > Do you have a recepie on how to reproduce this from scratch?
>
> It's a problem with git-gc too! Just clone some repository and run
> "stg branch -l" in it. It with show master. Run git-gc, and "stg
> branch -l" will show "No branches".

Well, well! In that case, I'm off to kill two birds with one stone!

> I see that in my Linux repository there are files in
> .git/refs/remotes/wireless-dev but not in other directories under
> .git/refs/remotes/

Presumably the only nonpacked refs are the ones that have been updated
after the last ref packing. Or however it works.

-- 
Karl Hasselström, kha@treskal.com
      www.treskal.com/kalle

^ permalink raw reply

* Re: git and larger trees, not so fast?
From: Junio C Hamano @ 2007-08-09 20:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: moe, git
In-Reply-To: <7vmyx0y3vp.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <gitster@pobox.com> writes:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
>> In fact, I'm almost sure I will *not* have time today.
>>
>> Anyway, the really trivial (and ugly) fix is to handle the cases of adding 
>> _independent_ stages to the index (which is the case for both "git 
>> diff-index" and "git read-tree -m") differently...
>> ...
>> Sadly, this is an area that is almost exclusively mine and Junio's. I'd 
>> love for somebody else to get their feet wet,...
>
> I hopefully have some time this evening to look into this, if
> not earlier.

So here is what I did during the lunch break.

wt-status has two calls to run_diff_index() to do the equivalent
of "git diff --cached".  This codepath is the sole caller of
read_tree(), and before calling read_tree(), vacates stage #1
entries, and reads tree contents to the index at stage #1,
without any funky "merge" magic.

This changes read_tree() to first make sure that there is not
any existing cache entries at specified stage (from the above
description, you can see this is not strictly needed if we are
only interested in supporting existing callers), and if that is
the case, it runs add_cache_entry() with ADD_CACHE_JUST_APPEND
flag (new), and then sort the resulting cache using qsort().

add_cache_entry() has been taught to omit all the checks such as
"Does this path already exist?  Does adding this path remove
other existing entries because it turns a directory to a file?"
and appends the given cache entry straight at the end of the
active cache.

The appending and sorting at the end destroys cache-tree
optimization, but we are not writing the resulting index out
anyway, so that is not a problem.

I do not know if this "fixes" the performance problem or not (I
do not have that much time during the day), so I would not call
this a "fix" yet, but at least the _change_ looks trivially
correct, and passes all the existing tests.

Interested parties may want to try it and see if it shifts the
bottleneck.

---

 cache.h      |    1 +
 read-cache.c |   20 +++++++++++++++-
 tree.c       |   69 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 85 insertions(+), 5 deletions(-)

diff --git a/cache.h b/cache.h
index e97af18..e5276e6 100644
--- a/cache.h
+++ b/cache.h
@@ -258,6 +258,7 @@ extern int index_name_pos(struct index_state *, const char *name, int namelen);
 #define ADD_CACHE_OK_TO_ADD 1		/* Ok to add */
 #define ADD_CACHE_OK_TO_REPLACE 2	/* Ok to replace file/directory */
 #define ADD_CACHE_SKIP_DFCHECK 4	/* Ok to skip DF conflict checks */
+#define ADD_CACHE_JUST_APPEND 8		/* Append only; tree.c::read_tree() */
 extern int add_index_entry(struct index_state *, struct cache_entry *ce, int option);
 extern struct cache_entry *refresh_cache_entry(struct cache_entry *ce, int really);
 extern int remove_index_entry_at(struct index_state *, int pos);
diff --git a/read-cache.c b/read-cache.c
index e060392..865369d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -665,7 +665,7 @@ static int check_file_directory_conflict(struct index_state *istate,
 	return retval + has_dir_name(istate, ce, pos, ok_to_replace);
 }
 
-int add_index_entry(struct index_state *istate, struct cache_entry *ce, int option)
+static int add_index_entry_with_check(struct index_state *istate, struct cache_entry *ce, int option)
 {
 	int pos;
 	int ok_to_add = option & ADD_CACHE_OK_TO_ADD;
@@ -707,6 +707,22 @@ int add_index_entry(struct index_state *istate, struct cache_entry *ce, int opti
 		pos = index_name_pos(istate, ce->name, ntohs(ce->ce_flags));
 		pos = -pos-1;
 	}
+	return pos + 1;
+}
+
+int add_index_entry(struct index_state *istate, struct cache_entry *ce, int option)
+{
+	int pos;
+
+	if (option & ADD_CACHE_JUST_APPEND)
+		pos = istate->cache_nr;
+	else {
+		int ret;
+		ret = add_index_entry_with_check(istate, ce, option);
+		if (ret <= 0)
+			return ret;
+		pos = ret - 1;
+	}
 
 	/* Make sure the array is big enough .. */
 	if (istate->cache_nr == istate->cache_alloc) {
@@ -717,7 +733,7 @@ int add_index_entry(struct index_state *istate, struct cache_entry *ce, int opti
 
 	/* Add it in.. */
 	istate->cache_nr++;
-	if (istate->cache_nr > pos)
+	if (istate->cache_nr > pos + 1)
 		memmove(istate->cache + pos + 1,
 			istate->cache + pos,
 			(istate->cache_nr - pos - 1) * sizeof(ce));
diff --git a/tree.c b/tree.c
index 04fe653..8c0819f 100644
--- a/tree.c
+++ b/tree.c
@@ -1,4 +1,5 @@
 #include "cache.h"
+#include "cache-tree.h"
 #include "tree.h"
 #include "blob.h"
 #include "commit.h"
@@ -7,7 +8,7 @@
 
 const char *tree_type = "tree";
 
-static int read_one_entry(const unsigned char *sha1, const char *base, int baselen, const char *pathname, unsigned mode, int stage)
+static int read_one_entry_opt(const unsigned char *sha1, const char *base, int baselen, const char *pathname, unsigned mode, int stage, int opt)
 {
 	int len;
 	unsigned int size;
@@ -25,7 +26,23 @@ static int read_one_entry(const unsigned char *sha1, const char *base, int basel
 	memcpy(ce->name, base, baselen);
 	memcpy(ce->name + baselen, pathname, len+1);
 	hashcpy(ce->sha1, sha1);
-	return add_cache_entry(ce, ADD_CACHE_OK_TO_ADD|ADD_CACHE_SKIP_DFCHECK);
+	return add_cache_entry(ce, opt);
+}
+
+static int read_one_entry(const unsigned char *sha1, const char *base, int baselen, const char *pathname, unsigned mode, int stage)
+{
+	return read_one_entry_opt(sha1, base, baselen, pathname, mode, stage,
+				  ADD_CACHE_OK_TO_ADD|ADD_CACHE_SKIP_DFCHECK);
+}
+
+/*
+ * This is used when the caller knows there is no existing entries at
+ * the stage that will conflict with the entry being added.
+ */
+static int read_one_entry_quick(const unsigned char *sha1, const char *base, int baselen, const char *pathname, unsigned mode, int stage)
+{
+	return read_one_entry_opt(sha1, base, baselen, pathname, mode, stage,
+				  ADD_CACHE_JUST_APPEND);
 }
 
 static int match_tree_entry(const char *base, int baselen, const char *path, unsigned int mode, const char **paths)
@@ -119,9 +136,55 @@ int read_tree_recursive(struct tree *tree,
 	return 0;
 }
 
+static int cmp_cache_name_compare(const void *a_, const void *b_)
+{
+	const struct cache_entry *ce1, *ce2;
+
+	ce1 = *((const struct cache_entry **)a_);
+	ce2 = *((const struct cache_entry **)b_);
+	return cache_name_compare(ce1->name, ntohs(ce1->ce_flags),
+				  ce2->name, ntohs(ce2->ce_flags));
+}
+
 int read_tree(struct tree *tree, int stage, const char **match)
 {
-	return read_tree_recursive(tree, "", 0, stage, match, read_one_entry);
+	read_tree_fn_t fn = NULL;
+	int i, err;
+
+	/*
+	 * Currently the only existing callers of this function all
+	 * call it with stage=1 and after making sure there is nothing
+	 * at that stage; we could always use read_one_entry_quick().
+	 *
+	 * But when we decide to straighten out git-read-tree not to
+	 * use unpack_trees() in some cases, this will probably start
+	 * to matter.
+	 */
+
+	/*
+	 * See if we have cache entry at the stage.  If so,
+	 * do it the original slow way, otherwise, append and then
+	 * sort at the end.
+	 */
+	for (i = 0; !fn && i < active_nr; i++) {
+		struct cache_entry *ce = active_cache[i];
+		if (ce_stage(ce) == stage)
+			fn = read_one_entry;
+	}
+
+	if (!fn)
+		fn = read_one_entry_quick;
+	err = read_tree_recursive(tree, "", 0, stage, match, fn);
+	if (fn == read_one_entry || err)
+		return err;
+
+	/*
+	 * Sort the cache entry -- we need to nuke the cache tree, though.
+	 */
+	cache_tree_free(&active_cache_tree);
+	qsort(active_cache, active_nr, sizeof(active_cache[0]),
+	      cmp_cache_name_compare);
+	return 0;
 }
 
 struct tree *lookup_tree(const unsigned char *sha1)

^ permalink raw reply related

* Re: git and larger trees, not so fast?
From: Sean @ 2007-08-09 20:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, moe, git
In-Reply-To: <7v7io4xwvp.fsf@assigned-by-dhcp.cox.net>

On Thu, 09 Aug 2007 13:42:50 -0700
Junio C Hamano <gitster@pobox.com> wrote:


> I do not know if this "fixes" the performance problem or not (I
> do not have that much time during the day), so I would not call
> this a "fix" yet, but at least the _change_ looks trivially
> correct, and passes all the existing tests.
> 
> Interested parties may want to try it and see if it shifts the
> bottleneck.

Junio,

This makes things _much_ better, however the final commit in the 
test script still shows a lot of user time:

## time git init
real    0m0.005s
user    0m0.001s
sys     0m0.004s

## time git add . 
real    0m3.501s
user    0m1.268s
sys     0m2.159s

## time git commit -q -m 'buurrrrn' -a
real    0m2.299s
user    0m1.065s
sys     0m1.317s

## time git status
real    0m1.107s
user    0m0.548s
sys     0m0.557s

## time git status
real    0m1.122s
user    0m0.545s
sys     0m0.557s

## time git status
real    0m1.142s
user    0m0.545s
sys     0m0.576s

## time git commit -q -m 'hurry' 50/500
real    0m16.944s
user    0m15.466s
sys     0m1.133s


Cheers,
Sean

^ permalink raw reply

* Re: [PATCH] git-merge: do up-to-date check also for strategies ours, subtree.
From: Junio C Hamano @ 2007-08-09 21:11 UTC (permalink / raw)
  To: Gerrit Pape; +Cc: git
In-Reply-To: <20070809120831.19319.qmail@a61af064a2a242.315fe32.mid.smarden.org>

Right now I do not have time to dig mailing list archive around
mid March 2006, and I do not recall the requestor's original
rationale, but I have a vague recollection that we added this
"no fast-forward check" specifically in response to a user
request.

^ permalink raw reply

* Re: msysgit: does git gui work?
From: Steffen Prohaska @ 2007-08-09 21:23 UTC (permalink / raw)
  To: Shawn O. Pearce, Johannes Schindelin; +Cc: Marius Storm-Olsen, Git Mailing List
In-Reply-To: <3CD6111C-13B5-444C-A28C-A7445C8A199B@zib.de>


On Aug 9, 2007, at 9:24 AM, Steffen Prohaska wrote:

> Does 'git gui' work for you in msysgit?
>
> I get
>
> Invalid command name "git-version"
>   while executing
> "git-version >= 1.5.3"
>    (in namespace eval "::blame" script line 36)
> [...]
>
> with msysgit (v1.5.3-rc2-690-g8ca1f6a)

Ok this is a bit complex but simple to solve: I created a
symlink from tclsh84.exe to tclsh.exe, that is

    cd /mingw/bin
    ln -s tclsh84.exe tclsh.exe

And than run

    make install

[
offtopic:
I also hacked the Makefile to set GIT_VERSION to normal
version number, like 99.9, because git gui complained that
it failed to parse the version number.

I'll look into this later.
]

Now the long story.

mingw only contains tclsh84 but not tclsh. This causes
the Makefile in git-gui to fail on the creation of
lib/tclIndex. Therefore git gui decides to take the slow
path of sourcing the files in lib explicitly but this failes
because they are sourced before git-version is defined.
Therefore blame.tcl reports the error mentioned above.

Johannes (or someone else from the msysgit team),
We should modify mingw to contain the symlink to tclsh.
Or something similar, at least 'tclsh' should be there.

Shawn,
The fallback mechanism of sourcing files from lib is broken.
Either git-version must be defined before sourcing them, or
the auto_index must always work.

	Steffen

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox