Git development
 help / color / mirror / Atom feed
* Re: git and larger trees, not so fast?
From: Linus Torvalds @ 2007-08-10 15:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Sean, moe, git
In-Reply-To: <7vy7gkue5s.fsf@assigned-by-dhcp.cox.net>



On Thu, 9 Aug 2007, Junio C Hamano wrote:
> 
> FWIW, moe's script with and without two patches gives these
> numbers for me.

Btw, I really think it's worth doing even just the hacky patches at this 
stage, even though it's late in the game for 1.5.3.

That performance problem is serious enough that I'd call it a major bug. 
Performance has always been one of the goals of git, and when you have a 
difference between 17s and 0.7s for "git status", that's a *huge* 
usability thing. It would be sad to release 1.5.3 with a known bug.

[ Some people don't think performance issues are "real bugs", and I think 
  such people shouldn't be allowed to program. ]

Side note: your first patch is actually quite noticeable on even just the 
kernel. Not nearly as much, but without it, I get about 0.5s, and with it, 
I get consistently under 0.3s. So it's about a 40% improvement even for 
smaller projects (and it's probably much more if you have a CPU with a 
smaller cache: my Core 2 Duo has 4MB of L2 cache, and a lot of the index 
will even fit in the L1 - a slower CPU with less cache will see a bigger 
impact, and with smaller repositories, from the unnecessary memory 
moving).

While 0.5s -> 0.3s may not sound like much, on a slower machine where it 
might otherwise be 2.5s -> 1.5s, that's likely to be quite noticeable.

In fact, I can tell even on my machine: 0.3s is visible as a "I'm clearly 
thinking about it" delay (quite frankly, it would be better at 0.1s, which 
is "immediate"), but 0.5s is already approaching the point where you 
actually wait for the answer (rather than just notice that it wasn't quite 
immediate).

				Linus

^ permalink raw reply

* Re: [PATCH 2/2] tweak manpage formatting
From: Brian Gernhardt @ 2007-08-10 15:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Carlos Rica
In-Reply-To: <7vbqdfvng9.fsf@assigned-by-dhcp.cox.net>


On Aug 10, 2007, at 3:49 AM, Junio C Hamano wrote:

> This attempts to force fixed-font in manpages for literal
> blocks.  I have tested this with docbook 1.71 and it seems to
> work as expected.

This broke building the manpages with docbook-xsl 1.69.  Apparently  
it doesn't understand "man.indent.width".  I fixed it by updating to  
1.71 from fink.  I don't know if 1.71 is the minimum supported  
version so thought this might be worth a note.

~~ Brian

^ permalink raw reply

* Re: git and larger trees, not so fast?
From: Linus Torvalds @ 2007-08-10 16:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Sean, moe, git
In-Reply-To: <7v643ovyli.fsf@assigned-by-dhcp.cox.net>



On Thu, 9 Aug 2007, Junio C Hamano wrote:
> > 
> > (I didn't test it, though, maybe I missed something).
> 
> I do not think the change affects the normal codepath.  The
> one-liner patch to git-commit.sh touches the codepath that
> updates the index used only to write out the partial commit, and
> losing the cached stat info from that index does not matter, as
> that index is removed immediately after writing the tree out and
> is never compared with working tree as far as I can tell.

You are, of course, mostly right. Using "-m" there is largely pointless, 
since it's a throw-away index, and we'll only ever use the exact paths 
that were given to us.

However, it does actually matter for one case: the case where you give a 
directory name or other name pattern, resulting in a *lot* of filenames. 
In that case, the commit will end up piping that (potentially very large) 
list to "git update-index --add --remove --stdin", and that will now mean 
that they *all* get their SHA1's recomputed.

Of course, that was the other performance bug that we already knew about 
(except we were thining "git add .", and fixed that case). So we're 
already slow at it - but we *shouldn't* be.

Try this on the kernel archive (use a clean one, so these things *should* 
all be no-ops):

	time sh -c "git add . ; git commit"

which is nice and fast and takes just over a second for me, but then try

	time git commit .

which *should* be nice and fast, but it takes forever, because we now 
re-compute all the SHA1's for *every* file. Of course, if it's all in the 
cache, it's still just 4s for me, but I tried with a cold cache, and it 
was over half a minute!

(I don't actually ever do something like "git commit .", but I could see 
people doing it. What I *do* do is that if I have multiple independent 
changes, I may actually do "git commit fs" to commit just part of them, 
and rather than list all the files, I literally just say "commit that 
sub-tree". So this really is another valid performance issue).

Sad.

			Linus

^ permalink raw reply

* Re: Bug in gitk: can't unset "idinlist(...) ..."
From: Jeff King @ 2007-08-10 16:11 UTC (permalink / raw)
  To: Brian Hetro; +Cc: git
In-Reply-To: <20070810154108.GA779@ruiner>

On Fri, Aug 10, 2007 at 11:41:08AM -0400, Brian Hetro wrote:

> Hi,
> I have a problem with gitk not being able to show one of my
> repositories (git version 1.5.3.rc4.41.g7efe).  I get this error while
> gitk starts:
> 
> can't unset "idinlist(f1d795add789ec43d3ccf1d35f3c39fb464f6e72)": no
> such element in array
> [...]
> 
> I performed a bisect and commit
> 1ed84157a21a3e868228b15588e4aadfbe5a030b appears to be the culprit
> (Revert 88494423 (removal of duplicate parents in the output
> codepath)).

This was fixed in e1abc69b, which is in 1.5.3-rc3, covered in this
thread:

http://thread.gmane.org/gmane.comp.version-control.git/53126

Can you confirm that it is still a problem in 1.5.3-rc4?

-Peff

^ permalink raw reply

* [fixed PATCH] git-filter-branch.sh: Fix broken setting of GIT_DIR
From: David Kastrup @ 2007-08-10 16:21 UTC (permalink / raw)
  To: git
In-Reply-To: <868x8j7aj2.fsf@lola.quinscape.zz>

If filter-branch is entered with an unset GIT_DIR, things are rather
fragile.  The GIT_DIR variable setting then points to something like
$(pwd)/../.. which is neither guaranteed to be a git directory
(depends on where filter-branch is started), nor will it continue to
work once the temporary directory (for which the pwd is output) ceases
to exist.

So we just call git-rev-parse in order to get the correct setting here
for exporting.

Signed-off-by: David Kastrup <dak@gnu.org>
---
 git-filter-branch.sh |   13 +++++--------
 1 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index b5fa449..9e9e8bf 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -170,14 +170,11 @@ do
 	esac
 done < "$tempdir"/backup-refs
 
-case "$GIT_DIR" in
-/*)
-	;;
-*)
-	GIT_DIR="$(pwd)/../../$GIT_DIR"
-	;;
-esac
-export GIT_DIR GIT_WORK_TREE=.
+GIT_DIR=$(cd ../..;cd "./$(git-rev-parse --git-dir)";pwd)
+
+GIT_WORK_TREE=.
+
+export GIT_DIR GIT_WORK_TREE
 
 # These refs should be updated if their heads were rewritten
 
-- 
1.5.3.rc4.74.g3739cb17

^ permalink raw reply related

* Re: git-p4 question
From: Alex Riesen @ 2007-08-10 16:25 UTC (permalink / raw)
  To: Govind Salinas; +Cc: git
In-Reply-To: <69b0c0350708100817u58f74acbyed5c19830cb92ed7@mail.gmail.com>

On 8/10/07, Govind Salinas <govindsalinas@gmail.com> wrote:
> I am having some problems running the script.  It seems that (line
> 1425) $f->{sha1} is uninitialized and causing the import to fail (the
> perforce path is missing the leading "//" I don't know if that is
> intentional).

the line in question deals with local pathnames, so it is expected.
What i don't understand is how did you manage to have the file
without sha1... Could you put a

 print "$tmpfile\n";

after line 1400?
 open($ft,'>',$tmpfile) or die "$tmpfile: $!\n";

and run git-p4-import -v -v -v?
Than it possible to compare lists of the files which is acquired
through p4 change with the list of files output by p4 print.

What perforce client version do you have, BTW?

>  Let me know if you want to continue this off list since
> it isn't really a part of git.

It isn't. We can continue privately

^ permalink raw reply

* Relative alternates question
From: Jan Hudec @ 2007-08-10 16:45 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1444 bytes --]

Hello,

Trying to solve a problem with object/info/alternates and http (a rather
problematic combination) I noticed the following code in sha1_file.c:336 in
function link_alt_odb_entries.
(current master -- v1.5.3-rc4-41-g7efeb8f):

                        if ((*last != '/') && depth) {
                                error("%s: ignoring relative alternate object store %s",
                                                relative_base, last);
                        } else {
                                link_alt_odb_entry(last, cp - last,
                                                relative_base, depth);
                        }

The last is (if I understood the code correctly) begining of line, cp is end
of that line and depth is depth of recursion in resolving the alternates. Now
unless I read the code completely wrong, it means, that when git reads
a repository, it resolves relative paths in it's objects/info/alternates, but
if that contains further alternates, it would ignore relative alternate paths
there -- and therefore not find objects needed from there.

And my question is, is there any good reason to reject relative paths in
alternates of an alternate? From what I see the recursive call to
link_alt_odb_entries (via link_alt_odb_entry and read_info_alternates) has
all the information it needs to resolve such paths.

Thanks,

Jan

-- 
						 Jan 'Bulb' Hudec <bulb@ucw.cz>

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: msysgit: does git gui work?
From: Steffen Prohaska @ 2007-08-10 16:47 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Marius Storm-Olsen, Git Mailing List, Shawn O. Pearce
In-Reply-To: <Pine.LNX.4.64.0708101452060.21857@racer.site>


On Aug 10, 2007, at 3:53 PM, Johannes Schindelin wrote:

> On Fri, 10 Aug 2007, Steffen Prohaska wrote:
>
>> I have a mob for /.git, but I do not have the setup for /git/.git.  
>> Maybe
>> I deleted it because I didn't understand what is means.
>
> Ah, I misunderstood.  Yes, it is quite possible to have a mob  
> installed
> for 4msysgit.git by default.  Should by done in
> msysgit.git:share/GitMe/setup-msysgit.sh.

Yeah, but what is the right URL to push mob to? I wasn't abel to
figure it out.


>> Whoever has setup the mob configurations, maybe it would be a good  
>> idea
>> to forbid non-fast-forward but instead allow the creation of new mob*
>> branches. If I can't push to mob, I could push to mob-topic instead.
>> Cleanup would be in the responsibility of the repository owner.
>
> This is not possible.  The refusal of a non-fast-forward is a per- 
> repo,
> not a per-user, configuration.

I see.

I expected some scripting magic in place that already deals with  
handling
the mob user, and expected it could be used to deny fast-forwards on  
a per
user basis. But I haven't looked into the git scripting hooks so far.
So I have basically no clue about what I'm talking here ;)

	Steffen

^ permalink raw reply

* Fix "git commit directory/" performance anomaly
From: Linus Torvalds @ 2007-08-10 16:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Sean, moe, git
In-Reply-To: <alpine.LFD.0.999.0708100852540.30176@woody.linux-foundation.org>


This trivial patch avoids re-hashing files that are already clean in the 
index. This mirrors what commit 0781b8a9b2fe760fc4ed519a3a26e4b9bd6ccffe 
did for "git add .", only for "git commit ." instead.

This improves the cold-cache case immensely, since we don't need to bring 
in all the file contents, just the index and any files dirty in the index.

Before:

	[torvalds@woody linux]$ time git commit .
	real    1m49.537s
	user    0m3.892s
	sys     0m2.432s

After:

	[torvalds@woody linux]$ time git commit .
	real    0m14.273s
	user    0m1.312s
	sys     0m0.516s

(both after doing a "echo 3 > /proc/sys/vm/drop_caches" to get cold-cache 
behaviour - even with the index optimization git still has to "lstat()" 
all the files, so with a truly cold cache, bringing all the inodes in 
will take some time).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

On Fri, 10 Aug 2007, Linus Torvalds wrote:
> 
> Try this on the kernel archive (use a clean one, so these things *should* 
> all be no-ops):
> 
> 	time sh -c "git add . ; git commit"
> 
> which is nice and fast and takes just over a second for me, but then try
> 
> 	time git commit .
> 
> which *should* be nice and fast, but it takes forever, because we now 
> re-compute all the SHA1's for *every* file. Of course, if it's all in the 
> cache, it's still just 4s for me, but I tried with a cold cache, and it 
> was over half a minute!
> 
> (I don't actually ever do something like "git commit .", but I could see 
> people doing it. What I *do* do is that if I have multiple independent 
> changes, I may actually do "git commit fs" to commit just part of them, 
> and rather than list all the files, I literally just say "commit that 
> sub-tree". So this really is another valid performance issue).
> 
> Sad.
> 
> 			Linus
> 
---
 builtin-update-index.c |   10 ++++++++--
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/builtin-update-index.c b/builtin-update-index.c
index 509369e..8d22dfa 100644
--- a/builtin-update-index.c
+++ b/builtin-update-index.c
@@ -86,9 +86,15 @@ static int process_lstat_error(const char *path, int err)
 
 static int add_one_path(struct cache_entry *old, const char *path, int len, struct stat *st)
 {
-	int option, size = cache_entry_size(len);
-	struct cache_entry *ce = xcalloc(1, size);
+	int option, size;
+	struct cache_entry *ce;
+
+	/* Was the old index entry already up-to-date? */
+	if (old && !ce_stage(old) && !ce_match_stat(old, st, 0))
+		return;
 
+	size = cache_entry_size(len);
+	ce = xcalloc(1, size);
 	memcpy(ce->name, path, len);
 	ce->ce_flags = htons(len);
 	fill_stat_cache_info(ce, st);

^ permalink raw reply related

* Re: Bug in gitk: can't unset "idinlist(...) ..."
From: Jeff King @ 2007-08-10 16:55 UTC (permalink / raw)
  To: Brian Hetro; +Cc: git
In-Reply-To: <20070810161123.GA14875@sigill.intra.peff.net>

On Fri, Aug 10, 2007 at 12:11:23PM -0400, Jeff King wrote:

> http://thread.gmane.org/gmane.comp.version-control.git/53126
> 
> Can you confirm that it is still a problem in 1.5.3-rc4?

I just tried to trigger the original bug (by running "gitk arch/i386" in
the linux-2.6 repo) using 1.5.3.rc4.41.g7efe, and I couldn't reproduce
it.

Can you make your test repo available?

Note that the fix _isn't_ in gitk, but rather in git itself. Is it
possible that gitk is calling another, older version of git-rev-list
that you have installed somewhere?

-Peff

^ permalink raw reply

* Re: [PATCH 2/2] tweak manpage formatting
From: Junio C Hamano @ 2007-08-10 17:17 UTC (permalink / raw)
  To: Brian Gernhardt; +Cc: git, Carlos Rica
In-Reply-To: <0F764326-63E8-447D-A2D4-E56E999775D7@silverinsanity.com>

Thanks, I'll revert that one.

^ permalink raw reply

* Re: Fix "git commit directory/" performance anomaly
From: Linus Torvalds @ 2007-08-10 17:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Sean, moe, git
In-Reply-To: <alpine.LFD.0.999.0708100924570.30176@woody.linux-foundation.org>



On Fri, 10 Aug 2007, Linus Torvalds wrote:
>
> This improves the cold-cache case immensely, since we don't need to bring 
> in all the file contents, just the index and any files dirty in the index.

The extreme - but not necessarily unusual - case of this is when the index 
and stat information is hot-cache (which is quite normal, in case you've 
done a "git diff" or something like that), but the whole source tree is 
otherwise not. Then the numbers look like this:

Before:
	[torvalds@woody linux]$ time git commit .
	real    1m33.751s
	user    0m3.864s
	sys     0m2.160s


After:
	[torvalds@woody linux]$ time git commit .
	real    0m1.415s
	user    0m1.176s
	sys     0m0.260s

The all-hot-cache numbers are better too, of course, just not nearly as 
noticeable:

Before:
	[torvalds@woody linux]$ time git commit .
	real    0m3.960s
	user    0m3.304s
	sys     0m0.672s

(and the after case is that 1.415s above, of course, so it's still more 
than twice as fast - it's just no longer a 60x performance difference!).

(Honesty in advertising: the "after" numbers also contain the "runstatus" 
speedup, which is the 0.5s -> 0.3s improvement)

			Linus

^ permalink raw reply

* Re: Bug in gitk: can't unset "idinlist(...) ..."
From: Jeff King @ 2007-08-10 17:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Brian Hetro, git
In-Reply-To: <20070810154108.GA779@ruiner>

On Fri, Aug 10, 2007 at 11:41:08AM -0400, Brian Hetro wrote:

> I have a problem with gitk not being able to show one of my
> repositories (git version 1.5.3.rc4.41.g7efe).  I get this error while
> gitk starts:
> 
> can't unset "idinlist(f1d795add789ec43d3ccf1d35f3c39fb464f6e72)": no
> such element in array

Junio,

I did a little digging on this bug, since it was so similar to the
duplicate parent errors we were getting a few weeks ago. As it turns
out, Brian's repository, generated by hg2git, actually has commit
objects with duplicate parents in them. Older versions of git used to
remove duplicate parents for all commits, but due to your commits
11d65967 and 1ed84157, we now do it just for history simplification.
Which is why he was able to bisect it to 1ed84157, even though it is
perhaps not a git bug.

So maybe the right attitude is "hg2git should not be generating such
broken commits" (or "gitk should not barf on such broken commits" :) ),
but I thought I would mention it as an additional data point for those
changes. Should git handle duplicate parents of this fashion more
robustly? Or should we just assume that they should never have been
generated in the first place?

-Peff

^ permalink raw reply

* Re: Bug in gitk: can't unset "idinlist(...) ..."
From: Linus Torvalds @ 2007-08-10 18:14 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Brian Hetro, Git Mailing List, Paul Mackerras
In-Reply-To: <20070810173242.GA23628@coredump.intra.peff.net>



On Fri, 10 Aug 2007, Jeff King wrote:
> 
> So maybe the right attitude is "hg2git should not be generating such
> broken commits"

I think this is true.

> (or "gitk should not barf on such broken commits" :) ),

And I think this is *also* true.

> but I thought I would mention it as an additional data point for those
> changes. Should git handle duplicate parents of this fashion more
> robustly? Or should we just assume that they should never have been
> generated in the first place?

I think git itself is quite robust in the face of duplicate parents, and 
it really is a gitk bug that it has problems with them. That said, I don't 
think we should *assume* they don't happen, and while we should consider 
it a bug in hg2git that they did, it is not a "serious" bug per se. It's 
only gitk that reacts this violently to it.

I guess we could prune duplicate parents even for commits that didn't get 
rewritten, but I don't see why we really should even have to. I think Paul 
already said that he should look into it:

			    "I see from the following messages that the
    bug turned out to be elsewhere in git, but it looks like gitk should 
    be more robust and do something sensible rather than just throwing a 
    Tcl error.  I'll look at it."

so I think we should fix gitk regardless, and then *maybe* also consider 
doing parent simplification universally.

			Linus

^ permalink raw reply

* Re: Fix "git commit directory/" performance anomaly
From: Junio C Hamano @ 2007-08-10 18:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Sean, moe, git
In-Reply-To: <alpine.LFD.0.999.0708100924570.30176@woody.linux-foundation.org>

Linus Torvalds <torvalds@linux-foundation.org> writes:

> This trivial patch avoids re-hashing files that are already clean in the 
> index. This mirrors what commit 0781b8a9b2fe760fc4ed519a3a26e4b9bd6ccffe 
> did for "git add .", only for "git commit ." instead.

Makes sense.  Thanks.

^ permalink raw reply

* Re: Fix "git commit directory/" performance anomaly
From: Linus Torvalds @ 2007-08-10 18:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Sean, moe, git
In-Reply-To: <7vsl6rs0l5.fsf@assigned-by-dhcp.cox.net>



On Fri, 10 Aug 2007, Junio C Hamano wrote:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
> 
> > This trivial patch avoids re-hashing files that are already clean in the 
> > index. This mirrors what commit 0781b8a9b2fe760fc4ed519a3a26e4b9bd6ccffe 
> > did for "git add .", only for "git commit ." instead.
> 
> Makes sense.  Thanks.

Please don't apply that patch without this trivial fix.

I don't know why I didn't notice. It passed all the tests, but it really 
shouldn't have, and the compiler warned.

		Linus

---
diff --git a/builtin-update-index.c b/builtin-update-index.c
index 8d22dfa..a7a4574 100644
--- a/builtin-update-index.c
+++ b/builtin-update-index.c
@@ -91,7 +91,7 @@ static int add_one_path(struct cache_entry *old, const char *path, int len, stru
 
 	/* Was the old index entry already up-to-date? */
 	if (old && !ce_stage(old) && !ce_match_stat(old, st, 0))
-		return;
+		return 0;
 
 	size = cache_entry_size(len);
 	ce = xcalloc(1, size);

^ permalink raw reply related

* [PATCH 1/2] Move old index entry removal from "unpack_trees()" into the individual functions
From: Linus Torvalds @ 2007-08-10 19:15 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List


This makes no changes to current code, but it allows the individual merge 
functions to decide what to do about the old entry.  They might decide to 
update it in place, rather than force them to always delete and re-add it.
    
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

This commit on purpose does *not* change any behaviour, it just lays the 
groundwork for the next patch, which optimizes the hell out of the two 
common cases.

 unpack-trees.c |   29 +++++++++++++++++++++++------
 unpack-trees.h |   11 ++++++-----
 2 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 5d1ffd1..7fed5d2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -58,10 +58,17 @@ static int entcmp(const char *name1, int dir1, const char *name2, int dir2)
 	return ret;
 }
 
+static inline void remove_entry(int remove)
+{
+	if (remove >= 0)
+		remove_cache_entry_at(remove);
+}
+
 static int unpack_trees_rec(struct tree_entry_list **posns, int len,
 			    const char *base, struct unpack_trees_options *o,
 			    struct tree_entry_list *df_conflict_list)
 {
+	int remove;
 	int baselen = strlen(base);
 	int src_size = len + 1;
 	int i_stk = i_stk;
@@ -145,10 +152,11 @@ static int unpack_trees_rec(struct tree_entry_list **posns, int len,
 
 		subposns = xcalloc(len, sizeof(struct tree_list_entry *));
 
+		remove = -1;
 		if (cache_name && !strcmp(cache_name, first)) {
 			any_files = 1;
 			src[0] = active_cache[o->pos];
-			remove_cache_entry_at(o->pos);
+			remove = o->pos;
 		}
 
 		for (i = 0; i < len; i++) {
@@ -214,13 +222,14 @@ static int unpack_trees_rec(struct tree_entry_list **posns, int len,
 						printf("\n");
 				}
 #endif
-				ret = o->fn(src, o);
+				ret = o->fn(src, o, remove);
 
 #if DBRT_DEBUG > 1
 				printf("Added %d entries\n", ret);
 #endif
 				o->pos += ret;
 			} else {
+				remove_entry(remove);
 				for (i = 0; i < src_size; i++) {
 					if (src[i]) {
 						add_cache_entry(src[i], ADD_CACHE_OK_TO_ADD|ADD_CACHE_SKIP_DFCHECK);
@@ -641,7 +650,8 @@ static void show_stage_entry(FILE *o,
 #endif
 
 int threeway_merge(struct cache_entry **stages,
-		struct unpack_trees_options *o)
+		struct unpack_trees_options *o,
+		int remove)
 {
 	struct cache_entry *index;
 	struct cache_entry *head;
@@ -657,6 +667,7 @@ int threeway_merge(struct cache_entry **stages,
 	int no_anc_exists = 1;
 	int i;
 
+	remove_entry(remove);		
 	for (i = 1; i < o->head_idx; i++) {
 		if (!stages[i] || stages[i] == o->df_conflict_entry)
 			any_anc_missing = 1;
@@ -809,12 +820,14 @@ int threeway_merge(struct cache_entry **stages,
  *
  */
 int twoway_merge(struct cache_entry **src,
-		struct unpack_trees_options *o)
+		struct unpack_trees_options *o,
+		int remove)
 {
 	struct cache_entry *current = src[0];
 	struct cache_entry *oldtree = src[1];
 	struct cache_entry *newtree = src[2];
 
+	remove_entry(remove);
 	if (o->merge_size != 2)
 		return error("Cannot do a twoway merge of %d trees",
 			     o->merge_size);
@@ -868,11 +881,13 @@ int twoway_merge(struct cache_entry **src,
  * stage0 does not have anything there.
  */
 int bind_merge(struct cache_entry **src,
-		struct unpack_trees_options *o)
+		struct unpack_trees_options *o,
+		int remove)
 {
 	struct cache_entry *old = src[0];
 	struct cache_entry *a = src[1];
 
+	remove_entry(remove);
 	if (o->merge_size != 1)
 		return error("Cannot do a bind merge of %d trees\n",
 			     o->merge_size);
@@ -891,11 +906,13 @@ int bind_merge(struct cache_entry **src,
  * - take the stat information from stage0, take the data from stage1
  */
 int oneway_merge(struct cache_entry **src,
-		struct unpack_trees_options *o)
+		struct unpack_trees_options *o,
+		int remove)
 {
 	struct cache_entry *old = src[0];
 	struct cache_entry *a = src[1];
 
+	remove_entry(remove);
 	if (o->merge_size != 1)
 		return error("Cannot do a oneway merge of %d trees",
 			     o->merge_size);
diff --git a/unpack-trees.h b/unpack-trees.h
index 9cd39a2..5517faa 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -4,7 +4,8 @@
 struct unpack_trees_options;
 
 typedef int (*merge_fn_t)(struct cache_entry **src,
-		struct unpack_trees_options *options);
+		struct unpack_trees_options *options,
+		int remove);
 
 struct unpack_trees_options {
 	int reset;
@@ -29,9 +30,9 @@ struct unpack_trees_options {
 extern int unpack_trees(unsigned n, struct tree_desc *t,
 		struct unpack_trees_options *options);
 
-int threeway_merge(struct cache_entry **stages, struct unpack_trees_options *o);
-int twoway_merge(struct cache_entry **src, struct unpack_trees_options *o);
-int bind_merge(struct cache_entry **src, struct unpack_trees_options *o);
-int oneway_merge(struct cache_entry **src, struct unpack_trees_options *o);
+int threeway_merge(struct cache_entry **stages, struct unpack_trees_options *o, int);
+int twoway_merge(struct cache_entry **src, struct unpack_trees_options *o, int);
+int bind_merge(struct cache_entry **src, struct unpack_trees_options *o, int);
+int oneway_merge(struct cache_entry **src, struct unpack_trees_options *o, int);
 
 #endif

^ permalink raw reply related

* Re: [PATCH 2/2] tweak manpage formatting
From: Carlos Rica @ 2007-08-10 19:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Brian Gernhardt, git
In-Reply-To: <7v3ayrtil5.fsf@assigned-by-dhcp.cox.net>

I think this was because the end-of-line backslash "\" in the ASCII
art diagram in the Documentation/git-rev-parse.txt.

I think it should be an interesting feature for asciidoc to have long \
lines broken in text version, if only would be a way to disable it.

Tried someone to escape that backslash with another backslash before
(as in "\\")?

2007/8/10, Junio C Hamano <gitster@pobox.com>:
> Thanks, I'll revert that one.
>
>

^ permalink raw reply

* [PATCH 2/2] Optimize the common cases of git-read-tree
From: Linus Torvalds @ 2007-08-10 19:21 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List
In-Reply-To: <alpine.LFD.0.999.0708101213560.30176@woody.linux-foundation.org>


This optimizes bind_merge() and oneway_merge() to not unnecessarily
remove and re-add the old index entries when they can just get replaced
by updated ones.

This makes these operations much faster for large trees (where "large"
is in the 50,000+ file range), because we don't unnecessarily move index
entries around in the index array all the time.

Using the "bummer" tree (a test-tree with 100,000 files) we get:

Before:
	[torvalds@woody bummer]$ time git commit -m"Change one file" 50/500
	real    0m9.470s
	user    0m8.729s
	sys     0m0.476s

After:
	[torvalds@woody bummer]$ time git commit -m"Change one file" 50/500
	real    0m1.173s
	user    0m0.720s
	sys     0m0.452s

so for large trees this is easily very noticeable indeed.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

Btw, these patches are based on top of my "speedup" branch, which includes 
the previous changes (including the change to use "struct tree_desc" in 
unpacking). However, apart from some context lines, they really should 
work fine on top of current master too, so if you don't want to take the 
"struct tree_desc" one, these patches should work fine even without it.

Also: this same "don't remove unnecessarily" case could (and should) be 
done for the two-way and three-way cases too, and it should be trivial to 
do. However, I didn't bother, since it wasn't the particular timings I was 
worried about. 

But doing that should help branch switching a lot, so I may send a "patch 
3/2" soon.

 unpack-trees.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 7fed5d2..b4e2618 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -887,7 +887,6 @@ int bind_merge(struct cache_entry **src,
 	struct cache_entry *old = src[0];
 	struct cache_entry *a = src[1];
 
-	remove_entry(remove);
 	if (o->merge_size != 1)
 		return error("Cannot do a bind merge of %d trees\n",
 			     o->merge_size);
@@ -912,13 +911,14 @@ int oneway_merge(struct cache_entry **src,
 	struct cache_entry *old = src[0];
 	struct cache_entry *a = src[1];
 
-	remove_entry(remove);
 	if (o->merge_size != 1)
 		return error("Cannot do a oneway merge of %d trees",
 			     o->merge_size);
 
-	if (!a)
+	if (!a) {
+		remove_entry(remove);
 		return deleted_entry(old, old, o);
+	}
 	if (old && same(old, a)) {
 		if (o->reset) {
 			struct stat st;

^ permalink raw reply related

* Re: [PATCH] Reinstate the old behaviour when GIT_DIR is set and GIT_WORK_TREE is unset
From: Junio C Hamano @ 2007-08-10 19:26 UTC (permalink / raw)
  To: Uwe Kleine-König; +Cc: Johannes Schindelin, git
In-Reply-To: <20070810112821.GA11026@informatik.uni-freiburg.de>

Uwe Kleine-König  <ukleinek@informatik.uni-freiburg.de> writes:

> I don't know if you planed to make 
>
> 	git checkout-index --prefix=/tmp/tra -a
>
> work (again) in a bare repo.  Probably not, so it's no surprise that it
> still doesn't work.

I think it is now safe to revert that NEED_WORK_TREE change
in git.c for checkout-index.

^ permalink raw reply

* Re: [PATCH 1/2] Fix an illustration in git-rev-parse.txt
From: Carlos Rica @ 2007-08-10 19:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v643nvng8.fsf@assigned-by-dhcp.cox.net>

Could it be also work to have two backslashes to disable the special
behaviour of that one? I cannot test it now, but it would be the right
way to do it, I think.

2007/8/10, Junio C Hamano <gitster@pobox.com>:
> This hides the backslash at the end of line from AsciiDoc
> toolchain by introducing a trailing whitespace on one line in an
> illustration in git-rev-parse.txt.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>
>  * Obviously this needs to be applied without --whitespace=strip
>
>  Documentation/git-rev-parse.txt |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/Documentation/git-rev-parse.txt b/Documentation/git-rev-parse.txt
> index eea9c9c..4b4d229 100644
> --- a/Documentation/git-rev-parse.txt
> +++ b/Documentation/git-rev-parse.txt
> @@ -224,7 +224,7 @@ left-to-right.
>      G   H   I   J
>       \ /     \ /
>        D   E   F
> -       \  |  / \
> +       \  |  / \
>          \ | /   |
>           \|/    |
>            B     C
>
>

^ permalink raw reply

* [PATCH 3/2] Optimize the two-way merge of git-read-tree too
From: Linus Torvalds @ 2007-08-10 19:31 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List
In-Reply-To: <alpine.LFD.0.999.0708101216000.30176@woody.linux-foundation.org>


This trivially optimizes the two-way merge case of git-read-tree too, 
which affects switching branches.

When you have tons and tons of files in your repository, but there are 
only small differences in the branches (maybe just a couple of files 
changed), the biggest cost of the branch switching was actually just the 
index calculations.

This fixes it (timings for switching between the "testing" and "master" 
branches in the 100,000 file testing-repo-from-hell, where the branches 
only differ in one small file).

Before:
	[torvalds@woody bummer]$ time git checkout master
	real    0m9.919s
	user    0m8.461s
	sys     0m0.264s

After:
	[torvalds@woody bummer]$ time git checkout testing
	real    0m0.576s
	user    0m0.348s
	sys     0m0.228s

so it's easily an order of magnitude different.

This concludes the series. I think we could/should do the three-way merge 
too (to speed up merges), but I'm lazy. Somebody else can do it.

The rule is very simple: you need to remove the old entry if:
 - you want to remove the file entirely
 - you replace it with a "merge conflict" entry (ie a non-stage-0 entry)

and you can avoid removing it if you either

 - keep the old one
 - or resolve it to a new one.

and these rules should all be valid for the three-way case too.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
commit b8d8d6aa12a3ae7e2f7a8cb008413b780e1152ce
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Fri Aug 10 12:13:41 2007 -0700

    Optimize the common cases of git-read-tree
    
    This optimizes bind_merge() and oneway_merge() to not unnecessarily
    remove and re-add the old index entries when they can just get replaced
    by updated ones.
    
    This makes these operations much faster for large trees (where "large"
    is in the 50,000+ file range), because we don't unnecessarily move index
    entries around in the index array all the time.
    
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 unpack-trees.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index b4e2618..810816e 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -827,7 +827,6 @@ int twoway_merge(struct cache_entry **src,
 	struct cache_entry *oldtree = src[1];
 	struct cache_entry *newtree = src[2];
 
-	remove_entry(remove);
 	if (o->merge_size != 2)
 		return error("Cannot do a twoway merge of %d trees",
 			     o->merge_size);
@@ -850,6 +849,7 @@ int twoway_merge(struct cache_entry **src,
 		}
 		else if (oldtree && !newtree && same(current, oldtree)) {
 			/* 10 or 11 */
+			remove_entry(remove);
 			return deleted_entry(oldtree, current, o);
 		}
 		else if (oldtree && newtree &&
@@ -859,6 +859,7 @@ int twoway_merge(struct cache_entry **src,
 		}
 		else {
 			/* all other failures */
+			remove_entry(remove);
 			if (oldtree)
 				reject_merge(oldtree);
 			if (current)
@@ -870,8 +871,8 @@ int twoway_merge(struct cache_entry **src,
 	}
 	else if (newtree)
 		return merged_entry(newtree, current, o);
-	else
-		return deleted_entry(oldtree, current, o);
+	remove_entry(remove);
+	return deleted_entry(oldtree, current, o);
 }
 
 /*

^ permalink raw reply related

* Re: [PATCH 2/2] tweak manpage formatting
From: Brian Gernhardt @ 2007-08-10 19:32 UTC (permalink / raw)
  To: Carlos Rica; +Cc: Junio C Hamano, git
In-Reply-To: <1b46aba20708101216s7660741ds5ef4c4c0fd13b45c@mail.gmail.com>


On Aug 10, 2007, at 3:16 PM, Carlos Rica wrote:

> I think this was because the end-of-line backslash "\" in the ASCII
> art diagram in the Documentation/git-rev-parse.txt.

No, it was complaining about unknown "man.indent.width" in  
callout.xsl.  Apparently docbook-xsl 1.69 doesn't like that.   
Updating it to 1.71 made it work again.  I lost the original error  
while updating before sending the first mail, unfortunately.  If it's  
truly needed, I probably still have the 1.69 deb file on my computer  
somewhere.

~~ Brian

^ permalink raw reply

* Re: Relative alternates question
From: Junio C Hamano @ 2007-08-10 19:38 UTC (permalink / raw)
  To: Jan Hudec; +Cc: git
In-Reply-To: <20070810164556.GB3442@efreet.light.src>

Jan Hudec <bulb@ucw.cz> writes:

> And my question is, is there any good reason to reject relative paths in
> alternates of an alternate? From what I see the recursive call to
> link_alt_odb_entries (via link_alt_odb_entry and read_info_alternates) has
> all the information it needs to resolve such paths.

As long as you are careful not to introduce loops that cause the
rest of the code to add the same thing twice, I do not think
there is anything fundamentally wrong with relative alternate
paths.  The original motivation of that check was not much more
than "let's not complicate our lives by supporting it", I think.

^ permalink raw reply

* Re: git and larger trees, not so fast?
From: Linus Torvalds @ 2007-08-10 19:39 UTC (permalink / raw)
  To: moe; +Cc: git
In-Reply-To: <20070809163026.GD568@mbox.bz>



On Thu, 9 Aug 2007, moe wrote:
> 
> earlier today i imported one of my larger trees
> (~70k files) into git and was quite disappointed
> by the performance.

Ok, I said I wouldn't have time to fix it yesterday, but today it's all 
done.

With the first fix from Junio yesterday (the one that fixed "git status"), 
and the fixes I've sent out today, your cases should not all be basically 
instantaneous (ie we're talking low seconds, even on not-the-fastest- 
possible-machines).

So with the following patches that were posted over the last 24 hours, you 
should be ok:

  Junio:
	Fix performance problem in "git status"

  Me:
	Start moving unpack-trees to "struct tree_desc"
	Fix "git commit directory/" performance anomaly (+ one-liner fix)
	Move old index entry removal from "unpack_trees()" into the individual functions
	Optimize the common cases of git-read-tree
	Optimize the two-way merge of git-read-tree too

(that patch from Junio was sent in an email in this thread, with the 
subject line "Re: git and larger trees, not so fast?" and a message ID of 
"<7v7io4xwvp.fsf@assigned-by-dhcp.cox.net>": the patches from me should 
all have the appropriate Subject lines and be findable that way).

If you can test with your real load to make sure, that would be good.

			Linus

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox