Git development
 help / color / mirror / Atom feed
* Re: [PATCH] Use a hashtable for objects instead of a sorted list
From: Junio C Hamano @ 2006-02-12  2:46 UTC (permalink / raw)
  To: Johannes Schindelin, Alexandre Julliard; +Cc: git
In-Reply-To: <Pine.LNX.4.63.0602120254260.10235@wbgn013.biozentrum.uni-wuerzburg.de>

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> In a simple test, this brings down the CPU time from 47 sec to 22 sec.

I was planning to take Alexandre's patch, but the approach your
patch takes feels more correct -- it scales with the number of
objects you need to handle, instead of having fixed 256
hashbuckets.

BTW, your version dumped core in hashtable_index immediately
after I started "git-rev-list --objects HEAD".  How did you get
_any_ CPU time?

I am not sure expecting that object name pointers are always
(unsigned int *) aligned as your patch does is OK.  We may want
to have something like the attached patch on top of yours.

I am also interested to find out how much the rehashing you do
when you update obj_allocs to a larger value is costing.

Alexandle, if you have a chance, could you try Johannes' patch
on your workload to see if it works OK for you?

-- >8 --
[PATCH] do not assume object name pointers are uint aligned.

Also fix an obvious bug that caused it dump core at my first
attempt.  There might be others but I did not actively look for
them.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---
diff --git a/object.c b/object.c
index 3259862..59e5e36 100644
--- a/object.c
+++ b/object.c
@@ -13,17 +13,24 @@ int track_object_refs = 1;
 
 static int hashtable_index(const unsigned char *sha1)
 {
-	unsigned int i = *(unsigned int *)sha1;
-	return (int)(i % obj_allocs);
+	int cnt;
+	unsigned int ix = *sha1++;
+
+	for (cnt = 1; cnt < sizeof(unsigned int); cnt++) {
+		ix <<= 8;
+		ix |= *sha1++;
+	}
+	return (int)(ix % obj_allocs);
 }
 
 static int find_object(const unsigned char *sha1)
 {
-	int i = hashtable_index(sha1);
+	int i;
 
 	if (!objs)
 		return -1;
 
+	i = hashtable_index(sha1);
 	while (objs[i]) {
 		if (memcmp(sha1, objs[i]->sha1, 20) == 0)
 			return i;

^ permalink raw reply related

* Re: [PATCH] Optimization for git-rev-list --objects
From: Junio C Hamano @ 2006-02-12  2:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0602111803350.3691@g5.osdl.org>

Linus Torvalds <torvalds@osdl.org> writes:

> Hmm? That would be a lot faster than what we have now, I suspect, and none 
> of the ops are expensive.
>
> Anybody want to try this approach?

I am game.  I was looking at Johannes' patch that uses growing
circular hash and having the same thought.

^ permalink raw reply

* Re: [PATCH] Optimization for git-rev-list --objects
From: Linus Torvalds @ 2006-02-12  2:19 UTC (permalink / raw)
  To: Alexandre Julliard; +Cc: git
In-Reply-To: <87slqpg11q.fsf@wine.dyndns.org>



On Sat, 11 Feb 2006, Alexandre Julliard wrote:
> 
> When building a large list of objects, most of the time is spent in
> created_object() inserting the objects in the sorted list. This patch
> splits the list in 256 sublists to reduce the impact of the O(n^2)
> list insertion.
> 
> On the WineHQ repository (220,000 objects), the patch reduces the time
> needed for a 'git-rev-list --objects HEAD' from 2 minutes to 20 seconds.

Goodie. However, I wonder if we care about the sorting at _all_.

The sorting really only helps "find_object()", and the thing is, we could 
certainly do that other ways too. Keeping a sorted list is kind of 
senseless, when the data is so amenable to be kept in a tree.

We could add left/right pointers to each object, and just keep them in a 
tree. We probably don't even need any fancy balancing, since hashing means 
that we could just keep the first object we ever allocate as the root, and 
things should normally be pretty damn balanced.

Then we could do:

	static struct object *root = NULL;

	static struct object **lookup_object_position(unsigned char *sha1)
	{
		struct object **p = &root;

		for (;;) {
			struct object *object = *p;
			int sign;

			if (!object)
				break;
			sign = memcmp(sha1, object->sha1, 20)
			if (!sign)
				break;
			p = &object->left;
			if (sign < 0)
				continue;
			p = &object->right;
		}
		return p;
	}

and now we _trivially_ have:

	struct object *lookup_object(unsigned char *sha1)
	{
		return *lookup_object_position(unsigned char *sha1);
	}

	void created_object(struct object **pos, unsigned char *sha1, struct object *n)
	{
		memcpy(n->sha1, sha1, 20);
		*pos = n;
	}

and the "lookup_or_create()" functions would become

	struct tag *lookup_tag(unsigned char *sha1)
	{
		struct object **p = lookup_object_position(sha1);
		struct object *object;

		object = *p;
		if (!object) {
			struct tag *ret = xmalloc(sizeof(struct tag));
			memset(ret, 0, sizeof(struct tag));
			created_object(p, sha1, &ret->object);
			ret->object.type = tag_type;
			return ret;
		}
		if (!object->type)
			obj->type = tag_type;

		if (object->type != tag_type) {
			error("Object %s is a %s, not a tag",
				sha1_to_hex(sha1), obj->type);
			return NULL;
		}
		return (struct tag *)obj;
	}

etc etc.

Hmm? That would be a lot faster than what we have now, I suspect, and none 
of the ops are expensive.

Anybody want to try this approach?

		Linus

^ permalink raw reply

* [PATCH] fetch-clone progress: finishing touches.
From: Junio C Hamano @ 2006-02-12  1:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0602111041430.3691@g5.osdl.org>

This makes fetch-pack also report the progress of packing part.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

 * While we are doing eye-candy, this makes the silence after
   "Generating pack..." part a bit more bearable.

   Likes, dislikes, too-much's?

   BTW, don't you mean 512 down there???

        -	msecs += (int)(tv.tv_usec - prev_tv.tv_usec) >> 10;
        +	msecs += usec_to_binarymsec(tv.tv_usec - prev_tv.tv_usec);
        +
                if (msecs > 500) {
                        prev_tv = tv;

 clone-pack.c   |    4 ++--
 pack-objects.c |   43 +++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 43 insertions(+), 4 deletions(-)

21fcd1bdea2440236aea1713ea42a66bc2da5563
diff --git a/clone-pack.c b/clone-pack.c
index 719e1c4..a4370f5 100644
--- a/clone-pack.c
+++ b/clone-pack.c
@@ -125,9 +125,9 @@ static int clone_pack(int fd[2], int nr_
 	}
 	clone_handshake(fd, refs);
 
-	if (!quiet)
-		fprintf(stderr, "Generating pack ...\r");
 	status = receive_keep_pack(fd, "git-clone-pack", quiet);
+	if (!quiet)
+		fprintf(stderr, "\n");
 
 	if (!status) {
 		if (nr_match == 0)
diff --git a/pack-objects.c b/pack-objects.c
index c3f2531..2135e9a 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -3,6 +3,7 @@
 #include "delta.h"
 #include "pack.h"
 #include "csum-file.h"
+#include <sys/time.h>
 
 static const char pack_usage[] = "git-pack-objects [--non-empty] [--local] [--incremental] [--window=N] [--depth=N] {--stdout | base-name} < object-list";
 
@@ -26,6 +27,7 @@ static struct object_entry *objects = NU
 static int nr_objects = 0, nr_alloc = 0;
 static const char *base_name;
 static unsigned char pack_file_sha1[20];
+static int progress = 0;
 
 static void *delta_against(void *buf, unsigned long size, struct object_entry *entry)
 {
@@ -362,10 +364,13 @@ static void find_deltas(struct object_en
 	int i, idx;
 	unsigned int array_size = window * sizeof(struct unpacked);
 	struct unpacked *array = xmalloc(array_size);
+	int eye_candy;
 
 	memset(array, 0, array_size);
 	i = nr_objects;
 	idx = 0;
+	eye_candy = i - (nr_objects / 20);
+
 	while (--i >= 0) {
 		struct object_entry *entry = list[i];
 		struct unpacked *n = array + idx;
@@ -373,6 +378,10 @@ static void find_deltas(struct object_en
 		char type[10];
 		int j;
 
+		if (progress && i <= eye_candy) {
+			eye_candy -= nr_objects / 20;
+			fputc('.', stderr);
+		}
 		free(n->data);
 		n->entry = entry;
 		n->data = read_sha1_file(entry->sha1, type, &size);
@@ -404,11 +413,13 @@ static void prepare_pack(int window, int
 {
 	get_object_details();
 
-	fprintf(stderr, "Packing %d objects\n", nr_objects);
-
+	if (progress)
+		fprintf(stderr, "Packing %d objects", nr_objects);
 	sorted_by_type = create_sorted_list(type_size_sort);
 	if (window && depth)
 		find_deltas(sorted_by_type, window+1, depth);
+	if (progress)
+		fputc('\n', stderr);
 	write_pack_file();
 }
 
@@ -472,6 +483,10 @@ int main(int argc, char **argv)
 	int window = 10, depth = 10, pack_to_stdout = 0;
 	struct object_entry **list;
 	int i;
+	struct timeval prev_tv;
+	int eye_candy = 0;
+	int eye_candy_incr = 500;
+
 
 	setup_git_directory();
 
@@ -519,12 +534,34 @@ int main(int argc, char **argv)
 	if (pack_to_stdout != !base_name)
 		usage(pack_usage);
 
+	progress = isatty(2);
+
 	prepare_packed_git();
+	if (progress) {
+		fprintf(stderr, "Generating pack...\n");
+		gettimeofday(&prev_tv, NULL);
+	}
 	while (fgets(line, sizeof(line), stdin) != NULL) {
 		unsigned int hash;
 		char *p;
 		unsigned char sha1[20];
 
+		if (progress && (eye_candy <= nr_objects)) {
+			fprintf(stderr, "Counting objects...%d\r", nr_objects);
+			if (eye_candy && (50 <= eye_candy_incr)) {
+				struct timeval tv;
+				int time_diff;
+				gettimeofday(&tv, NULL);
+				time_diff = (tv.tv_sec - prev_tv.tv_sec);
+				time_diff <<= 10;
+				time_diff += (tv.tv_usec - prev_tv.tv_usec);
+				if ((1 << 9) < time_diff)
+					eye_candy_incr += 50;
+				else if (50 < eye_candy_incr)
+					eye_candy_incr -= 50;
+			}
+			eye_candy += eye_candy_incr;
+		}
 		if (get_sha1_hex(line, sha1))
 			die("expected sha1, got garbage:\n %s", line);
 		hash = 0;
@@ -537,6 +574,8 @@ int main(int argc, char **argv)
 		}
 		add_object_entry(sha1, hash);
 	}
+	if (progress)
+		fprintf(stderr, "Done counting %d objects.\n", nr_objects);
 	if (non_empty && !nr_objects)
 		return 0;
 
-- 
1.1.6.g69c5

^ permalink raw reply related

* [PATCH] Use a hashtable for objects instead of a sorted list
From: Johannes Schindelin @ 2006-02-12  1:57 UTC (permalink / raw)
  To: Alexandre Julliard; +Cc: git
In-Reply-To: <87slqpg11q.fsf@wine.dyndns.org>


In a simple test, this brings down the CPU time from 47 sec to 22 sec.

Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>

---

	On Sat, 11 Feb 2006, Alexandre Julliard wrote:

	> When building a large list of objects, most of the time is spent in
	> created_object() inserting the objects in the sorted list. This patch
	> splits the list in 256 sublists to reduce the impact of the O(n^2)
	> list insertion.

	Your patch brought down the CPU time to 27 sec in my test.

 fsck-objects.c |    5 +++-
 name-rev.c     |    7 +++---
 object.c       |   67 +++++++++++++++++++++++++++++++++-----------------------
 object.h       |    2 +-
 4 files changed, 48 insertions(+), 33 deletions(-)

diff --git a/fsck-objects.c b/fsck-objects.c
index 9950be2..6439d55 100644
--- a/fsck-objects.c
+++ b/fsck-objects.c
@@ -61,9 +61,12 @@ static void check_connectivity(void)
 	int i;
 
 	/* Look up all the requirements, warn about missing objects.. */
-	for (i = 0; i < nr_objs; i++) {
+	for (i = 0; i < obj_allocs; i++) {
 		struct object *obj = objs[i];
 
+		if (!obj)
+			continue;
+
 		if (!obj->parsed) {
 			if (!standalone && has_sha1_file(obj->sha1))
 				; /* it is in pack */
diff --git a/name-rev.c b/name-rev.c
index bdaa59b..08faa54 100644
--- a/name-rev.c
+++ b/name-rev.c
@@ -303,9 +303,10 @@ int main(int argc, char **argv)
 	} else if (all) {
 		int i;
 
-		for (i = 0; i < nr_objs; i++)
-			printf("%s %s\n", sha1_to_hex(objs[i]->sha1),
-					get_rev_name(objs[i]));
+		for (i = 0; i < obj_allocs; i++)
+			if (objs[i])
+				printf("%s %s\n", sha1_to_hex(objs[i]->sha1),
+						get_rev_name(objs[i]));
 	} else
 		for ( ; revs; revs = revs->next)
 			printf("%s %s\n", revs->name, get_rev_name(revs->item));
diff --git a/object.c b/object.c
index 1577f74..3259862 100644
--- a/object.c
+++ b/object.c
@@ -6,30 +6,32 @@
 #include "tag.h"
 
 struct object **objs;
-int nr_objs;
-static int obj_allocs;
+static int nr_objs;
+int obj_allocs;
 
 int track_object_refs = 1;
 
+static int hashtable_index(const unsigned char *sha1)
+{
+	unsigned int i = *(unsigned int *)sha1;
+	return (int)(i % obj_allocs);
+}
+
 static int find_object(const unsigned char *sha1)
 {
-	int first = 0, last = nr_objs;
+	int i = hashtable_index(sha1);
+
+	if (!objs)
+		return -1;
 
-        while (first < last) {
-                int next = (first + last) / 2;
-                struct object *obj = objs[next];
-                int cmp;
-
-                cmp = memcmp(sha1, obj->sha1, 20);
-                if (!cmp)
-                        return next;
-                if (cmp < 0) {
-                        last = next;
-                        continue;
-                }
-                first = next+1;
-        }
-        return -first-1;
+	while (objs[i]) {
+		if (memcmp(sha1, objs[i]->sha1, 20) == 0)
+			return i;
+		i++;
+		if (i == obj_allocs)
+			i = 0;
+	}
+	return -1 - i;
 }
 
 struct object *lookup_object(const unsigned char *sha1)
@@ -42,7 +44,7 @@ struct object *lookup_object(const unsig
 
 void created_object(const unsigned char *sha1, struct object *obj)
 {
-	int pos = find_object(sha1);
+	int pos;
 
 	obj->parsed = 0;
 	memcpy(obj->sha1, sha1, 20);
@@ -50,18 +52,27 @@ void created_object(const unsigned char 
 	obj->refs = NULL;
 	obj->used = 0;
 
-	if (pos >= 0)
-		die("Inserting %s twice\n", sha1_to_hex(sha1));
-	pos = -pos-1;
-
-	if (obj_allocs == nr_objs) {
-		obj_allocs = alloc_nr(obj_allocs);
+	if (obj_allocs - 1 <= nr_objs * 2) {
+		int i, count = obj_allocs;
+		obj_allocs = (obj_allocs < 32 ? 32 : 2 * obj_allocs);
 		objs = xrealloc(objs, obj_allocs * sizeof(struct object *));
+		memset(objs + count, 0, (obj_allocs - count)
+				* sizeof(struct object *));
+		for (i = 0; i < count; i++)
+			if (objs[i]) {
+				int j = find_object(objs[i]->sha1);
+				if (j != i) {
+					j = -1 - j;
+					objs[j] = objs[i];
+					objs[i] = NULL;
+				}
+			}
 	}
 
-	/* Insert it into the right place */
-	memmove(objs + pos + 1, objs + pos, (nr_objs - pos) * 
-		sizeof(struct object *));
+	pos = find_object(sha1);
+	if (pos >= 0)
+		die("Inserting %s twice\n", sha1_to_hex(sha1));
+	pos = -pos-1;
 
 	objs[pos] = obj;
 	nr_objs++;
diff --git a/object.h b/object.h
index 0e76182..e08afbd 100644
--- a/object.h
+++ b/object.h
@@ -23,7 +23,7 @@ struct object {
 };
 
 extern int track_object_refs;
-extern int nr_objs;
+extern int obj_allocs;
 extern struct object **objs;
 
 /** Internal only **/

^ permalink raw reply related

* Re: [PATCH 1/3] Call extended-semantics commands through variables.
From: Junio C Hamano @ 2006-02-12  0:36 UTC (permalink / raw)
  To: Jason Riedy; +Cc: git
In-Reply-To: <4230.1139699411@lotus.CS.Berkeley.EDU>

Jason Riedy <ejr@EECS.Berkeley.EDU> writes:

> Is there a better way of grabbing all the tags now?  I haven't
> kept track, as I haven't had to do that in a while.

Recent 'git-fetch' automatically follows tags that are attached
to commits you slurp (following example set by Cogito), to
reduce the need to grab all tags to begin with.  That would not
help tags that are attached to objects that are not part of
branches you are tracking; you can use 'git-fetch --tags' for
them.

^ permalink raw reply

* Re: [PATCH 1/3] Call extended-semantics commands through variables.
From: Junio C Hamano @ 2006-02-11 23:32 UTC (permalink / raw)
  To: Jason Riedy; +Cc: git
In-Reply-To: <4230.1139699411@lotus.CS.Berkeley.EDU>

Jason Riedy <ejr@EECS.Berkeley.EDU> writes:

> And I worry about using different programs in different 
> scripts, so I just changed all of them.

That's a good point.  I stand corrected.

^ permalink raw reply

* Re: [PATCH 1/3] Call extended-semantics commands through variables.
From: Jason Riedy @ 2006-02-11 23:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vwtg2mmx5.fsf@assigned-by-dhcp.cox.net>

And Junio C Hamano writes:
 - The use of FIND or CPIO in git clone does not need -0 (and the
 - code does not use -0, nor your patch adds -0), so you should not
 - have to override them this way.

I'm not sure what's up with cpio, but git causes pkgsrc's 
default cpio to segfault on my Solaris machines.  It's 
easier to point CPIO at a different cpio than debug a 
utility I've never really used.  ;)  I thought it was
git-clone breaking in the tests, but it could have been
git-merge.  I'll check again when I get a chance.

And I worry about using different programs in different 
scripts, so I just changed all of them.

 - (BTW, you got count-objects one wrong; there is a leftover 
 - GIT_FIND there).

Friday afternoon patching, sorry.  That also means either 
that count-objects has no test cases or that branch of it 
is not exercised by tests.

 - The places we _do_ use -0 currently should be converted
 - with something like your patch to use -0 capable version of the
 - tool.

Again, I'm not very comfortable using different finds or 
xargs in different places.  But if you want, I'll re-do the
patch with just those locations changed, and I'll double-
check which cpio invocation is breaking.

 - [...] but nobody should be using rsync transfer anyway,
 - so...

Is there a better way of grabbing all the tags now?  I haven't
kept track, as I haven't had to do that in a while.

Jason

^ permalink raw reply

* Re: [PATCH] git-commit: Only call git-rerere if $GIT_DIR/rr-cache exists
From: Junio C Hamano @ 2006-02-11 21:16 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, junkio
In-Reply-To: <Pine.LNX.4.63.0602111602270.26560@wbgn013.biozentrum.uni-wuerzburg.de>

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> This makes an error go away if you do not have Digest.pm installed, but
> do not intend to make use of git-rerere anyway.

Ah, I did not realize that dependency.  Thanks.
I think there is at least one more place it does it.

^ permalink raw reply

* Re: gitweb using "--cc"?
From: Junio C Hamano @ 2006-02-11 20:59 UTC (permalink / raw)
  To: Marco Costalba; +Cc: git
In-Reply-To: <e5bfff550602110117i7b742351m14e908de10aac12c@mail.gmail.com>

Marco Costalba <mcostalba@gmail.com> writes:

> Please _do not_ change this behaviour to make -m a no-op as stated in
> "diff-tree -c raw output" patch message
> (ee63802422af14e43eccce3c6dc4150a27ceb1a3).
>
> qgit has the possibility to switch from "see all merge files"
> to "see interesting only", so we really need that difference
> between 'git-diff-tree -r' and 'git-diff-tree -r -m'

Let me make sure I am not misreading you.  You are proposing to
revert making -m a no-op.  So '-r' and '-r -m' would do
different things, like illustrated in the log message below.

All of the above combinations of flags produces the same result
for non-merge commit, by the way.

Ack, or did I grossly misunderstand what you wanted?

-- >8 --
[PATCH] diff-tree: do not default to -c

Marco says it breaks qgit.  This makes the flags a bit more
orthogonal.

  $ git-diff-tree -r --abbrev ca18

    No output from this command because you asked to skip merge by
    not having -m there.

  $ git-diff-tree -r -m --abbrev ca18
  ca182053c7710a286d72102f4576cf32e0dafcfb
  :100644 100644 538d21d... 59042d1... M	Makefile
  :100644 100644 410b758... 6c47c3a... M	entry.c
  ca182053c7710a286d72102f4576cf32e0dafcfb
  :100644 100644 30479b4... 59042d1... M	Makefile

    The same "independent sets of diff" as before without -c.

  $ git-diff-tree -r -m -c --abbrev ca18
  ca182053c7710a286d72102f4576cf32e0dafcfb
  ::100644 100644 100644 538d21d... 30479b4... 59042d1... MM	Makefile

    Combined.

  $ git-diff-tree -r -c --abbrev ca18
  ca182053c7710a286d72102f4576cf32e0dafcfb
  ::100644 100644 100644 538d21d... 30479b4... 59042d1... MM	Makefile

    Asking for combined without -m does not make sense, so -c
    implies -m.

We need to supply -c as default to whatchanged, which is a
one-liner.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---
diff --git a/diff-tree.c b/diff-tree.c
index b170b03..f55a35a 100644
--- a/diff-tree.c
+++ b/diff-tree.c
@@ -6,7 +6,7 @@ static int show_root_diff = 0;
 static int no_commit_id = 0;
 static int verbose_header = 0;
 static int ignore_merges = 1;
-static int combine_merges = 1;
+static int combine_merges = 0;
 static int dense_combined_merges = 0;
 static int read_stdin = 0;
 static int always_show_header = 0;
@@ -248,7 +248,7 @@ int main(int argc, const char **argv)
 			continue;
 		}
 		if (!strcmp(arg, "-m")) {
-			combine_merges = ignore_merges = 0;
+			ignore_merges = 0;
 			continue;
 		}
 		if (!strcmp(arg, "-c")) {
diff --git a/git-whatchanged.sh b/git-whatchanged.sh
index 574fc35..1fb9feb 100755
--- a/git-whatchanged.sh
+++ b/git-whatchanged.sh
@@ -10,7 +10,7 @@ case "$0" in
 	count=
 	test -z "$diff_tree_flags" &&
 		diff_tree_flags=$(git-repo-config --get whatchanged.difftree)
-	diff_tree_default_flags='-M --abbrev' ;;
+	diff_tree_default_flags='-c -M --abbrev' ;;
 *show)
 	count=-n1
 	test -z "$diff_tree_flags" &&

^ permalink raw reply related

* Re: gitweb using "--cc"?
From: Junio C Hamano @ 2006-02-11 19:32 UTC (permalink / raw)
  To: Marco Costalba; +Cc: git
In-Reply-To: <e5bfff550602110117i7b742351m14e908de10aac12c@mail.gmail.com>

Marco Costalba <mcostalba@gmail.com> writes:

> On 2/9/06, Junio C Hamano <junkio@cox.net> wrote:
>>
>> Does it matter?  I presume that a Porcelain that cares would
>> rather use the traditional "diff-tree -m -r" to look at diff
>> with each parent.  I dunno.
>
> Yes, please preserve this behaviour.
>...
> Please _do not_ change this behaviour to make -m a no-op as stated in
> "diff-tree -c raw output" patch message
> (ee63802422af14e43eccce3c6dc4150a27ceb1a3).

The one you pulled already contains another one to fix that
ee6380 change done by gittus.  What "diff-tree -r -m ca1820"
shows should be the same as traditional "diff-tree -r -m ca1820"
output.

What is different is "diff-tree ca1820".  It used to show
*nothing* only because it is a merge.  It now defaults to show
"diff-tree -c ca1820".

For the sake of backward compatibility we could change it to not
output anything, but I sort of feel that is backwards.  If a
Porcelain wants raw-diff for 1 (or more) parents, "diff-tree -r
-m" has been the way to do so before the ee6380 change, and that
output has not changed (well ee6380 might have changed it but
now it is fixed).

^ permalink raw reply

* Re: Two crazy proposals for changing git's diff commands
From: Junio C Hamano @ 2006-02-11 19:25 UTC (permalink / raw)
  To: Kent Engstrom; +Cc: git
In-Reply-To: <m31wybknuv.fsf@athena.unit.liu.se>

Kent Engstrom <kent@lysator.liu.se> writes:

> Junio C Hamano <junkio@cox.net> writes:
>> *1* Often I find myself doing that while rewriting messy
>> development history.  For example, I start doing some
>> work without knowing exactly where it leads, and end up with a
>> history like this:
>
> Perhaps this discussion could be added under Documentation/howto?

Maybe.

Traditionally, person who feels that way proofreads and
copyedits the original posting and submits a patch ;-).

^ permalink raw reply

* [PATCH] Optimization for git-rev-list --objects
From: Alexandre Julliard @ 2006-02-11 19:14 UTC (permalink / raw)
  To: git


When building a large list of objects, most of the time is spent in
created_object() inserting the objects in the sorted list. This patch
splits the list in 256 sublists to reduce the impact of the O(n^2)
list insertion.

On the WineHQ repository (220,000 objects), the patch reduces the time
needed for a 'git-rev-list --objects HEAD' from 2 minutes to 20 seconds.

Signed-off-by: Alexandre Julliard <julliard@winehq.org>

---

 fsck-objects.c |   66 +++++++++++++++++++++++++++++---------------------------
 name-rev.c     |    9 ++++----
 object.c       |   24 ++++++++++----------
 object.h       |    4 ++-
 4 files changed, 53 insertions(+), 50 deletions(-)

68f8fcb7f5883ecc44a80e822e0b78ee8efd9eb9
diff --git a/fsck-objects.c b/fsck-objects.c
index 9950be2..5bdb21e 100644
--- a/fsck-objects.c
+++ b/fsck-objects.c
@@ -58,45 +58,47 @@ static int objwarning(struct object *obj
 
 static void check_connectivity(void)
 {
-	int i;
+	int i, j;
 
 	/* Look up all the requirements, warn about missing objects.. */
-	for (i = 0; i < nr_objs; i++) {
-		struct object *obj = objs[i];
+	for (i = 0; i < 256; i++) {
+		for (j = 0; j < nr_objs[i]; j++) {
+			struct object *obj = objs[i][j];
 
-		if (!obj->parsed) {
-			if (!standalone && has_sha1_file(obj->sha1))
-				; /* it is in pack */
-			else
-				printf("missing %s %s\n",
-				       obj->type, sha1_to_hex(obj->sha1));
-			continue;
-		}
+			if (!obj->parsed) {
+				if (!standalone && has_sha1_file(obj->sha1))
+					; /* it is in pack */
+				else
+					printf("missing %s %s\n",
+					       obj->type, sha1_to_hex(obj->sha1));
+				continue;
+			}
 
-		if (obj->refs) {
-			const struct object_refs *refs = obj->refs;
-			unsigned j;
-			for (j = 0; j < refs->count; j++) {
-				struct object *ref = refs->ref[j];
-				if (ref->parsed ||
-				    (!standalone && has_sha1_file(ref->sha1)))
-					continue;
-				printf("broken link from %7s %s\n",
-				       obj->type, sha1_to_hex(obj->sha1));
-				printf("              to %7s %s\n",
-				       ref->type, sha1_to_hex(ref->sha1));
+			if (obj->refs) {
+				const struct object_refs *refs = obj->refs;
+				unsigned k;
+				for (k = 0; k < refs->count; k++) {
+					struct object *ref = refs->ref[k];
+					if (ref->parsed ||
+					    (!standalone && has_sha1_file(ref->sha1)))
+						continue;
+					printf("broken link from %7s %s\n",
+					       obj->type, sha1_to_hex(obj->sha1));
+					printf("              to %7s %s\n",
+					       ref->type, sha1_to_hex(ref->sha1));
+				}
 			}
-		}
 
-		if (show_unreachable && !(obj->flags & REACHABLE)) {
-			printf("unreachable %s %s\n",
-			       obj->type, sha1_to_hex(obj->sha1));
-			continue;
-		}
+			if (show_unreachable && !(obj->flags & REACHABLE)) {
+				printf("unreachable %s %s\n",
+				       obj->type, sha1_to_hex(obj->sha1));
+				continue;
+			}
 
-		if (!obj->used) {
-			printf("dangling %s %s\n", obj->type, 
-			       sha1_to_hex(obj->sha1));
+			if (!obj->used) {
+				printf("dangling %s %s\n", obj->type, 
+				       sha1_to_hex(obj->sha1));
+			}
 		}
 	}
 }
diff --git a/name-rev.c b/name-rev.c
index bbadb91..9a8d086 100644
--- a/name-rev.c
+++ b/name-rev.c
@@ -230,11 +230,12 @@ int main(int argc, char **argv)
 				fwrite(p_start, p - p_start, 1, stdout);
 		}
 	} else if (all) {
-		int i;
+		int i, j;
 
-		for (i = 0; i < nr_objs; i++)
-			printf("%s %s\n", sha1_to_hex(objs[i]->sha1),
-					get_rev_name(objs[i]));
+		for (i = 0; i < 256; i++)
+			for (j = 0; j < nr_objs[i]; j++)
+				printf("%s %s\n", sha1_to_hex(objs[i][j]->sha1),
+						get_rev_name(objs[i][j]));
 	} else
 		for ( ; revs; revs = revs->next)
 			printf("%s %s\n", revs->name, get_rev_name(revs->item));
diff --git a/object.c b/object.c
index 1577f74..fcd4d0c 100644
--- a/object.c
+++ b/object.c
@@ -5,19 +5,19 @@
 #include "commit.h"
 #include "tag.h"
 
-struct object **objs;
-int nr_objs;
-static int obj_allocs;
+struct object **objs[256];
+int nr_objs[256];
+static int obj_allocs[256];
 
 int track_object_refs = 1;
 
 static int find_object(const unsigned char *sha1)
 {
-	int first = 0, last = nr_objs;
+	int first = 0, last = nr_objs[*sha1];
 
         while (first < last) {
                 int next = (first + last) / 2;
-                struct object *obj = objs[next];
+                struct object *obj = objs[*sha1][next];
                 int cmp;
 
                 cmp = memcmp(sha1, obj->sha1, 20);
@@ -36,7 +36,7 @@ struct object *lookup_object(const unsig
 {
 	int pos = find_object(sha1);
 	if (pos >= 0)
-		return objs[pos];
+		return objs[*sha1][pos];
 	return NULL;
 }
 
@@ -54,17 +54,17 @@ void created_object(const unsigned char 
 		die("Inserting %s twice\n", sha1_to_hex(sha1));
 	pos = -pos-1;
 
-	if (obj_allocs == nr_objs) {
-		obj_allocs = alloc_nr(obj_allocs);
-		objs = xrealloc(objs, obj_allocs * sizeof(struct object *));
+	if (obj_allocs[*sha1] == nr_objs[*sha1]) {
+		obj_allocs[*sha1] = alloc_nr(obj_allocs[*sha1]);
+		objs[*sha1] = xrealloc(objs[*sha1], obj_allocs[*sha1] * sizeof(struct object *));
 	}
 
 	/* Insert it into the right place */
-	memmove(objs + pos + 1, objs + pos, (nr_objs - pos) * 
+	memmove(objs[*sha1] + pos + 1, objs[*sha1] + pos, (nr_objs[*sha1] - pos) * 
 		sizeof(struct object *));
 
-	objs[pos] = obj;
-	nr_objs++;
+	objs[*sha1][pos] = obj;
+	nr_objs[*sha1]++;
 }
 
 struct object_refs *alloc_object_refs(unsigned count)
diff --git a/object.h b/object.h
index 0e76182..dcd6ac4 100644
--- a/object.h
+++ b/object.h
@@ -23,8 +23,8 @@ struct object {
 };
 
 extern int track_object_refs;
-extern int nr_objs;
-extern struct object **objs;
+extern int nr_objs[256];
+extern struct object **objs[256];
 
 /** Internal only **/
 struct object *lookup_object(const unsigned char *sha1);
-- 
1.1.6.g68f8


-- 
Alexandre Julliard
julliard@winehq.org

^ permalink raw reply related

* Re: Make "git clone" less of a deathly quiet experience
From: Keith Packard @ 2006-02-11 19:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: keithp, Junio C Hamano, Git Mailing List, Petr Baudis
In-Reply-To: <Pine.LNX.4.64.0602110943170.3691@g5.osdl.org>

[-- Attachment #1: Type: text/plain, Size: 674 bytes --]

On Sat, 2006-02-11 at 09:45 -0800, Linus Torvalds wrote:

> More importantly, it really wouldn't have helped that much in this 
> situation. At least for me, the network is 90% of the problem, the 
> pack-file generation is at most 10%. So cached packfiles really only 
> matter for server-side problems (high CPU load, or lack of memory, or 
> heavy disk activity).

I'd like to see git use less CPU than CVS does on my distribution host;
some mechanism for re-using either existing or cached packs would help a
whole lot with that. The alternative is to see people switch to rsync
instead, which seems like a far worse idea.   

-- 
keith.packard@intel.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: Make "git clone" less of a deathly quiet experience
From: Linus Torvalds @ 2006-02-11 19:04 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Git Mailing List, Junio C Hamano, Petr Baudis
In-Reply-To: <20060211183959.GA9984@steel.home>



On Sat, 11 Feb 2006, Alex Riesen wrote:
> 
> I'd put a \n before finish_pack to make it nicer.

Yes.

Duh. I did all my testing with "time git clone ..", so I had the extra \n 
added by the fact that "time" itself will do it.

Side comment: the pack preparation stage seems to take about 90s for the 
kernel. Of course, that will keep growing with history, but so will 
probably the pack-size, so percentage-wise, the 90% / 10% thing is likely 
to hold for DSL (yes, DSL gets faster too, but so do CPU ;).

That 90s is unquestionably irritating, though, so we do want to either 
cache them, or add similar "I'm working on it" output to that phase too.

		Linus

^ permalink raw reply

* Make "git clone" pack-fetching download statistics better
From: Linus Torvalds @ 2006-02-11 18:43 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List


Make "git clone" pack-fetching download statistics better

Average it out over a few events to make the numbers stable, and fix the
silly usec->binary-ms conversion.

Yeah, yeah, it's arguably eye-candy to keep the user calm, but let's do 
that right.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
---

This is obviously against my previous diff to do the verbose output in the 
first place. If you want a combined diff, just holler.

This makes the download speed be totally stable for me on an otherwise 
idle DSL connection (146 kB/s, which sounds right, in case anybody cares).

diff --git a/fetch-clone.c b/fetch-clone.c
index d8216cb..da1b3ff 100644
--- a/fetch-clone.c
+++ b/fetch-clone.c
@@ -130,12 +130,35 @@ int receive_unpack_pack(int fd[2], const
 	die("git-unpack-objects died of unnatural causes %d", status);
 }
 
+/*
+ * We average out the download speed over this many "events", where
+ * an event is a minimum of about half a second. That way, we get
+ * a reasonably stable number.
+ */
+#define NR_AVERAGE (4)
+
+/*
+ * A "binary msec" is a power-of-two-msec, aka 1/1024th of a second.
+ * Keeing the time in that format means that "bytes / msecs" means
+ * is the same as kB/s (modulo rounding).
+ *
+ * 1000512 is a magic number (usecs in a second, rounded up by half
+ * of 1024, to make "rounding" come out right ;)
+ */
+#define usec_to_binarymsec(x) ((int)(x) / (1000512 >> 10))
+
 int receive_keep_pack(int fd[2], const char *me, int quiet)
 {
 	char tmpfile[PATH_MAX];
 	int ofd, ifd;
 	unsigned long total;
 	static struct timeval prev_tv;
+	struct average {
+		unsigned long bytes;
+		unsigned long time;
+	} download[NR_AVERAGE] = { {0, 0}, };
+	unsigned long avg_bytes, avg_time;
+	int idx = 0;
 
 	ifd = fd[0];
 	snprintf(tmpfile, sizeof(tmpfile),
@@ -146,6 +169,8 @@ int receive_keep_pack(int fd[2], const c
 
 	gettimeofday(&prev_tv, NULL);
 	total = 0;
+	avg_bytes = 0;
+	avg_time = 0;
 	while (1) {
 		char buf[8192];
 		ssize_t sz, wsz, pos;
@@ -184,14 +209,27 @@ int receive_keep_pack(int fd[2], const c
 			gettimeofday(&tv, NULL);
 			msecs = tv.tv_sec - prev_tv.tv_sec;
 			msecs <<= 10;
-			msecs += (int)(tv.tv_usec - prev_tv.tv_usec) >> 10;
+			msecs += usec_to_binarymsec(tv.tv_usec - prev_tv.tv_usec);
+
 			if (msecs > 500) {
 				prev_tv = tv;
 				last = total;
-				fprintf(stderr, "%4lu.%03luMB  (%lu kB/s)        \r",
+
+				/* Update averages ..*/
+				avg_bytes += diff;
+				avg_time += msecs;
+				avg_bytes -= download[idx].bytes;
+				avg_time -= download[idx].time;
+				download[idx].bytes = diff;
+				download[idx].time = msecs;
+				idx++;
+				if (idx >= NR_AVERAGE)
+					idx = 0;
+
+				fprintf(stderr, "%4lu.%03luMB  (%lu kB/s)      \r",
 					total >> 20,
 					1000*((total >> 10) & 1023)>>10,
-					diff / msecs );
+					avg_bytes / avg_time );
 			}
 		}
 	}

^ permalink raw reply related

* Fix fetch-clone in the presense of signals
From: Linus Torvalds @ 2006-02-11 18:41 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List


We shouldn't fail a fetch just because a signal might have interrupted
the read.

Normally, we don't install any signal handlers, so EINTR really shouldn't 
happen. That said, really old versions of Linux will interrupt an 
interruptible system call even for signals that turn out to be ignored 
(SIGWINCH is the classic example - resizing your xterm would cause it). 
The same might well be true elsewhere too.

Also, since receive_keep_pack() doesn't control the caller, it can't know 
that no signal handlers exist.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
---

diff --git a/fetch-clone.c b/fetch-clone.c
index b67d976..d8216cb 100644
--- a/fetch-clone.c
+++ b/fetch-clone.c
@@ -153,10 +153,13 @@ int receive_keep_pack(int fd[2], const c
 		if (sz == 0)
 			break;
 		if (sz < 0) {
-			error("error reading pack (%s)", strerror(errno));
-			close(ofd);
-			unlink(tmpfile);
-			return -1;
+			if (errno != EINTR && errno != EAGAIN) {
+				error("error reading pack (%s)", strerror(errno));
+				close(ofd);
+				unlink(tmpfile);
+				return -1;
+			}
+			sz = 0;
 		}
 		pos = 0;
 		while (pos < sz) {

^ permalink raw reply related

* Re: Make "git clone" less of a deathly quiet experience
From: Alex Riesen @ 2006-02-11 18:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List, Junio C Hamano, Petr Baudis
In-Reply-To: <Pine.LNX.4.64.0602102018250.3691@g5.osdl.org>

Linus Torvalds, Sat, Feb 11, 2006 05:31:09 +0100:
> So with this patch, I get something like this on my DSL line:
> 
> 	[torvalds@g5 ~]$ time git clone master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 clone-test
> 	Packing 188543 objects
> 	  48.398MB  (154 kB/s)

I get this:

    $ git clone . ../cloned
    Packing 15440 objects
    $ 2 kB/s)

I'd put a \n before finish_pack to make it nicer.

Signed-off-by: Alex Riesen <raa.lkml@gmail.com>

diff --git a/fetch-clone.c b/fetch-clone.c
index b67d976..37141e9 100644
--- a/fetch-clone.c
+++ b/fetch-clone.c
@@ -193,5 +193,7 @@ int receive_keep_pack(int fd[2], const c
 		}
 	}
 	close(ofd);
+	if ( !quiet )
+	    fputc('\n', stderr);
 	return finish_pack(tmpfile, me);
 }

^ permalink raw reply related

* Experiences with git-svnimport
From: Christian Biesinger @ 2006-02-11 18:22 UTC (permalink / raw)
  To: git

Hi,

since I now got git-svnimport to work, I thought I'd post my experiences 
with it, as that may be helpful for others.

I had various setups here. One of them was the usual (?) one, where I 
had a subversion repository at https://servername/svn, containing three 
projects.

One of them followed the usual structure:
   projectname/trunk
   projectname/branches/...

Now, the straightforward way would be:
   git-svnimport -C somedir https://servername/svn projectname

It turns out that this is equivalent to
   git-svnimport -C somedir https://servername/svn/projectname
if the -d or -D option is not specified, and those can't be used with https.

And that doesn't work. The problem is that the SVN repository contains 
urls relative to the top of the repository, i.e. /projectname/trunk/...
That doesn't match the /trunk/ that's expected by svnimport.

The solution for this: Apply the patch I sent earlier and use:
   git-svnimport -v -C somedir -T projectname/trunk -b 
projectname/branches https://servername/svn/

It turned out that the other two projects on this server didn't use the 
trunk/branches structure (I forgot about that when setting them up), but 
that was not a problem: I could just do:

   git-svnimport -v -C somedir -T projectname https://servername/svn/

This also works fine for a svn+ssh:// repository.

Now, maybe my setup was unusual, and I should instead have used separate 
repositories for different projects. However, this worked for me, and my 
experiences in getting this to work might be of interest to others.

^ permalink raw reply

* Re: Make "git clone" less of a deathly quiet experience
From: Linus Torvalds @ 2006-02-11 17:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Petr Baudis
In-Reply-To: <7vwtg2o37c.fsf@assigned-by-dhcp.cox.net>



On Fri, 10 Feb 2006, Junio C Hamano wrote:
> 
> It probably should default to quiet if (!isatty(1)).

Sounds fine. isatty(2), though, since we use stderr for these messages 
(stdout is usually the data-stream).

> The real improvement, independent of this client-side patch,
> would be to reuse recently generated packs, but that needs
> writable cache directory on the server side.

More importantly, it really wouldn't have helped that much in this 
situation. At least for me, the network is 90% of the problem, the 
pack-file generation is at most 10%. So cached packfiles really only 
matter for server-side problems (high CPU load, or lack of memory, or 
heavy disk activity).

So the problems really are very independent.

			Linus

^ permalink raw reply

* Re: Make "git clone" less of a deathly quiet experience
From: Linus Torvalds @ 2006-02-11 17:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Petr Baudis
In-Reply-To: <7vslqqo341.fsf@assigned-by-dhcp.cox.net>



On Fri, 10 Feb 2006, Junio C Hamano wrote:
> 
> Would you suggest doing that with "checkout-index -v", that
> shows "1 path1\r2 path2\r3 path3\r...\rDone.\n"?

Not if it shows every single path.

When going tty output, we should be careful to limit it to not do tons and 
tons of lines. The download output does gettimeofday to limit itself to 
max 2 times per sec, and the percentage output of git-unpack-objects 
similarly limits itself so that it never spews _tons_ of stuff to the 
terminal.

Under many loads, the terminal will be a lot slower than actually writing 
a file ("context switch to gnome-term + context switch to X + set up 
complex text output").

		Linus

^ permalink raw reply

* Re: Make "git clone" less of a deathly quiet experience
From: Alex Riesen @ 2006-02-11 17:24 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Junio C Hamano, Linus Torvalds, Git Mailing List
In-Reply-To: <20060211133340.GS31278@pasky.or.cz>

Petr Baudis, Sat, Feb 11, 2006 14:33:40 +0100:
> > It probably should default to quiet if (!isatty(1)).
> 
> isatty(2) or something, 1 is in practice always a ref generator. Perhaps
> it would be better not to clutter stderr, though; what about directly
> opening /dev/tty? Does Cygwin support that?

It can't. Windows has no terminals (as in "none at all"). It has a
Console, which is a special kind of window attached to an application
and where the unbuffered stdout and stderr are magically redirected.

A test for is stdout/err is a tty can only check if the process has
the console attached, and an attempt to open it for writing will
probably just create the thing.

^ permalink raw reply

* Re: [PATCH] Use a relative path for SVN importing
From: Christian Biesinger @ 2006-02-11 17:04 UTC (permalink / raw)
  To: git
In-Reply-To: <1139672651713-git-send-email-cbiesinger@web.de>

Christian Biesinger wrote:
> The absolute path (with the leading slash) breaks SVN importing, because it then
> looks for /trunk/... instead of /svn/trunk/... (in my case, the repository URL
> was https://servername/svn/)

I also tested a svn+ssh case and that works too (it also worked before, 
oddly enough).

^ permalink raw reply

* [PATCH] Use a relative path for SVN importing
From: Christian Biesinger @ 2006-02-11 15:44 UTC (permalink / raw)
  To: git

The absolute path (with the leading slash) breaks SVN importing, because it then
looks for /trunk/... instead of /svn/trunk/... (in my case, the repository URL
was https://servername/svn/)

Signed-off-by: Christian Biesinger <cbiesinger@web.de>

---

 git-svnimport.perl |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

7718b5a5fec01c0774f6b9cdc84f908e68b403ac
diff --git a/git-svnimport.perl b/git-svnimport.perl
index b6799d8..f17d5a2 100755
--- a/git-svnimport.perl
+++ b/git-svnimport.perl
@@ -318,7 +318,7 @@ sub get_file($$$) {
 			die $res->status_line." at $url\n";
 		}
 	} else {
-		$name = $svn->file("/$svnpath",$rev);
+		$name = $svn->file("$svnpath",$rev);
 		return undef unless defined $name;
 	}
 
-- 
1.1.6.g486a-dirty

^ permalink raw reply related

* gitk: searching on filenames?
From: walt @ 2006-02-11 15:42 UTC (permalink / raw)
  To: git

A real example from this morning:  I noticed when pulling from
Linus that kernel/sched.c had been updated, so I wanted to look
at that particular commit.

If I do 'gitk kernel/sched.c' then it does what I expect.

But, if I just start gitk with no arguments and then attempt
to search for kernel/sched.c it eventually stops with an error
saying "can't find diffs for <SHA1> <SHA2>".

Is this the expected behavior?  (I have the 'Files' field
selected in the gitk search bar.)

Thanks!

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox