git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Remove more parsers
@ 2006-01-29 19:04 Daniel Barkalow
  2006-01-29 19:04 ` [PATCH 1/3] Use struct tree in tar-tree Daniel Barkalow
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Daniel Barkalow @ 2006-01-29 19:04 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

I found some more open-coded tree parsers and a commit parser. These 
changes shouldn't affect behavior meaningfully, and all of the tests pass. 
I did notice, however, that the rev-list functionality of pruning by path 
isn't actually tested in the tests; I forgot to update it and it didn't 
cause any tests to fail. I also couldn't figure out if 
"same_tree_as_empty" has important side effects, so I updated its use of 
diff_tree; in order to do what the name says, "return t1 && !t1->entries" 
would suffice.

There's also a parser in convert-objects, but it may be best to keep it, 
since it may want to deal with things which are now considered invalid, 
and would be rejected by the normal parser.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] Use struct tree in tar-tree
  2006-01-29 19:04 [PATCH 0/3] Remove more parsers Daniel Barkalow
@ 2006-01-29 19:04 ` Daniel Barkalow
  2006-01-29 19:05 ` [PATCH 2/3] Use struct commit " Daniel Barkalow
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Daniel Barkalow @ 2006-01-29 19:04 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

It was using an open-coded tree parser; use a struct tree instead.

Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>

---

 tar-tree.c |   48 +++++++++++++++++++++++-------------------------
 1 files changed, 23 insertions(+), 25 deletions(-)

d2a4152dfd1b79f32a28ca4b2deb202a4e928441
diff --git a/tar-tree.c b/tar-tree.c
index f749d4b..6219754 100644
--- a/tar-tree.c
+++ b/tar-tree.c
@@ -3,6 +3,7 @@
  */
 #include <time.h>
 #include "cache.h"
+#include "tree.h"
 
 #define RECORDSIZE	(512)
 #define BLOCKSIZE	(RECORDSIZE * 20)
@@ -334,40 +335,37 @@ static void write_header(const unsigned 
 	write_if_needed();
 }
 
-static void traverse_tree(void *buffer, unsigned long size,
+static void traverse_tree(struct tree *tree,
 			  struct path_prefix *prefix)
 {
 	struct path_prefix this_prefix;
+	struct tree_entry_list *item;
 	this_prefix.prev = prefix;
 
-	while (size) {
-		int namelen = strlen(buffer)+1;
+	parse_tree(tree);
+	item = tree->entries;
+
+	while (item) {
 		void *eltbuf;
 		char elttype[20];
 		unsigned long eltsize;
-		unsigned char *sha1 = buffer + namelen;
-		char *path = strchr(buffer, ' ') + 1;
-		unsigned int mode;
-
-		if (size < namelen + 20 || sscanf(buffer, "%o", &mode) != 1)
-			die("corrupt 'tree' file");
-		if (S_ISDIR(mode) || S_ISREG(mode))
-			mode |= (mode & 0100) ? 0777 : 0666;
-		buffer = sha1 + 20;
-		size -= namelen + 20;
 
-		eltbuf = read_sha1_file(sha1, elttype, &eltsize);
+		eltbuf = read_sha1_file(item->item.any->sha1, 
+					elttype, &eltsize);
 		if (!eltbuf)
-			die("cannot read %s", sha1_to_hex(sha1));
-		write_header(sha1, TYPEFLAG_AUTO, basedir, prefix, path,
-		             mode, eltbuf, eltsize);
-		if (!strcmp(elttype, "tree")) {
-			this_prefix.name = path;
-			traverse_tree(eltbuf, eltsize, &this_prefix);
-		} else if (!strcmp(elttype, "blob") && !S_ISLNK(mode)) {
+			die("cannot read %s", 
+			    sha1_to_hex(item->item.any->sha1));
+		write_header(item->item.any->sha1, TYPEFLAG_AUTO, basedir, 
+			     prefix, item->name,
+		             item->mode, eltbuf, eltsize);
+		if (item->directory) {
+			this_prefix.name = item->name;
+			traverse_tree(item->item.tree, &this_prefix);
+		} else if (!item->symlink) {
 			write_blocked(eltbuf, eltsize);
 		}
 		free(eltbuf);
+		item = item->next;
 	}
 }
 
@@ -404,6 +402,7 @@ int main(int argc, char **argv)
 	unsigned char commit_sha1[20];
 	void *buffer;
 	unsigned long size;
+	struct tree *tree;
 
 	setup_git_directory();
 
@@ -425,8 +424,8 @@ int main(int argc, char **argv)
 		archive_time = commit_time(buffer, size);
 		free(buffer);
 	}
-	buffer = read_object_with_reference(sha1, "tree", &size, NULL);
-	if (!buffer)
+	tree = parse_tree_indirect(sha1);
+	if (!tree)
 		die("not a reference to a tag, commit or tree object: %s",
 		    sha1_to_hex(sha1));
 	if (!archive_time)
@@ -434,8 +433,7 @@ int main(int argc, char **argv)
 	if (basedir)
 		write_header((unsigned char *)"0", TYPEFLAG_DIR, NULL, NULL,
 			basedir, 040777, NULL, 0);
-	traverse_tree(buffer, size, NULL);
-	free(buffer);
+	traverse_tree(tree, NULL);
 	write_trailer();
 	return 0;
 }
-- 
1.0.GIT

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] Use struct commit in tar-tree
  2006-01-29 19:04 [PATCH 0/3] Remove more parsers Daniel Barkalow
  2006-01-29 19:04 ` [PATCH 1/3] Use struct tree in tar-tree Daniel Barkalow
@ 2006-01-29 19:05 ` Daniel Barkalow
  2006-01-29 19:05 ` [PATCH 3/3] Use struct tree in diff-tree Daniel Barkalow
  2006-01-29 20:26 ` [PATCH 0/3] Remove more parsers Junio C Hamano
  3 siblings, 0 replies; 11+ messages in thread
From: Daniel Barkalow @ 2006-01-29 19:05 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

It was open-coding getting the commit date from a commit.

Signed-off-by: Daniel Barkalow <barkalow@iabervon>

---

 tar-tree.c |   41 ++++++-----------------------------------
 1 files changed, 6 insertions(+), 35 deletions(-)

ca523a725295fff6f1456cd7207f6b24f8c2e843
diff --git a/tar-tree.c b/tar-tree.c
index 6219754..d36baed 100644
--- a/tar-tree.c
+++ b/tar-tree.c
@@ -4,6 +4,7 @@
 #include <time.h>
 #include "cache.h"
 #include "tree.h"
+#include "commit.h"
 
 #define RECORDSIZE	(512)
 #define BLOCKSIZE	(RECORDSIZE * 20)
@@ -369,39 +370,10 @@ static void traverse_tree(struct tree *t
 	}
 }
 
-/* get commit time from committer line of commit object */
-static time_t commit_time(void * buffer, unsigned long size)
-{
-	time_t result = 0;
-	char *p = buffer;
-
-	while (size > 0) {
-		char *endp = memchr(p, '\n', size);
-		if (!endp || endp == p)
-			break;
-		*endp = '\0';
-		if (endp - p > 10 && !memcmp(p, "committer ", 10)) {
-			char *nump = strrchr(p, '>');
-			if (!nump)
-				break;
-			nump++;
-			result = strtoul(nump, &endp, 10);
-			if (*endp != ' ')
-				result = 0;
-			break;
-		}
-		size -= endp - p - 1;
-		p = endp + 1;
-	}
-	return result;
-}
-
 int main(int argc, char **argv)
 {
 	unsigned char sha1[20];
-	unsigned char commit_sha1[20];
-	void *buffer;
-	unsigned long size;
+	struct commit *commit;
 	struct tree *tree;
 
 	setup_git_directory();
@@ -418,11 +390,10 @@ int main(int argc, char **argv)
 		usage(tar_tree_usage);
 	}
 
-	buffer = read_object_with_reference(sha1, "commit", &size, commit_sha1);
-	if (buffer) {
-		write_global_extended_header(commit_sha1);
-		archive_time = commit_time(buffer, size);
-		free(buffer);
+	commit = lookup_commit_reference(sha1);
+	if (commit) {
+		write_global_extended_header(commit->object.sha1);
+		archive_time = commit->date;
 	}
 	tree = parse_tree_indirect(sha1);
 	if (!tree)
-- 
1.0.GIT

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] Use struct tree in diff-tree
  2006-01-29 19:04 [PATCH 0/3] Remove more parsers Daniel Barkalow
  2006-01-29 19:04 ` [PATCH 1/3] Use struct tree in tar-tree Daniel Barkalow
  2006-01-29 19:05 ` [PATCH 2/3] Use struct commit " Daniel Barkalow
@ 2006-01-29 19:05 ` Daniel Barkalow
  2006-01-31 16:53   ` Linus Torvalds
  2006-01-29 20:26 ` [PATCH 0/3] Remove more parsers Junio C Hamano
  3 siblings, 1 reply; 11+ messages in thread
From: Daniel Barkalow @ 2006-01-29 19:05 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

It had been open-coding a tree parser. This updates the programs that
call diff_tree() to send it the struct tree instead of a buffer and
size.

Signed-off-by: Daniel Barkalow

---

 diff-tree.c |   13 ++----
 diff.h      |    9 +---
 rev-list.c  |   13 ------
 tree-diff.c |  123 ++++++++++++++++++++++++++---------------------------------
 4 files changed, 63 insertions(+), 95 deletions(-)

1530551cfd7dbecfe3258cab09b01f10a9b8549d
diff --git a/diff-tree.c b/diff-tree.c
index 6593a69..d64cba0 100644
--- a/diff-tree.c
+++ b/diff-tree.c
@@ -49,18 +49,13 @@ static int diff_tree_sha1_top(const unsi
 static int diff_root_tree(const unsigned char *new, const char *base)
 {
 	int retval;
-	void *tree;
-	struct tree_desc empty, real;
+	struct tree *real;
 
-	tree = read_object_with_reference(new, "tree", &real.size, NULL);
-	if (!tree)
+	real = parse_tree_indirect(new);
+	if (!real)
 		die("unable to read root tree (%s)", sha1_to_hex(new));
-	real.buf = tree;
 
-	empty.buf = "";
-	empty.size = 0;
-	retval = diff_tree(&empty, &real, base, &diff_options);
-	free(tree);
+	retval = diff_tree(NULL, real, base, &diff_options);
 	call_diff_flush();
 	return retval;
 }
diff --git a/diff.h b/diff.h
index 9a0169c..9142c04 100644
--- a/diff.h
+++ b/diff.h
@@ -4,15 +4,12 @@
 #ifndef DIFF_H
 #define DIFF_H
 
+#include "tree.h"
+
 #define DIFF_FILE_CANON_MODE(mode) \
 	(S_ISREG(mode) ? (S_IFREG | ce_permissions(mode)) : \
 	S_ISLNK(mode) ? S_IFLNK : S_IFDIR)
 
-struct tree_desc {
-	void *buf;
-	unsigned long size;
-};
-
 struct diff_options;
 
 typedef void (*change_fn_t)(struct diff_options *options,
@@ -51,7 +48,7 @@ struct diff_options {
 };
 
 extern void diff_tree_setup_paths(const char **paths);
-extern int diff_tree(struct tree_desc *t1, struct tree_desc *t2,
+extern int diff_tree(struct tree *t1, struct tree *t2,
 		     const char *base, struct diff_options *opt);
 extern int diff_tree_sha1(const unsigned char *old, const unsigned char *new,
 			  const char *base, struct diff_options *opt);
diff --git a/rev-list.c b/rev-list.c
index 0b142c1..86da74a 100644
--- a/rev-list.c
+++ b/rev-list.c
@@ -485,23 +485,12 @@ static int compare_tree(struct tree *t1,
 static int same_tree_as_empty(struct tree *t1)
 {
 	int retval;
-	void *tree;
-	struct tree_desc empty, real;
 
 	if (!t1)
 		return 0;
 
-	tree = read_object_with_reference(t1->object.sha1, "tree", &real.size, NULL);
-	if (!tree)
-		return 0;
-	real.buf = tree;
-
-	empty.buf = "";
-	empty.size = 0;
-
 	tree_difference = 0;
-	retval = diff_tree(&empty, &real, "", &diff_opt);
-	free(tree);
+	retval = diff_tree(NULL, t1, "", &diff_opt);
 
 	return retval >= 0 && !tree_difference;
 }
diff --git a/tree-diff.c b/tree-diff.c
index 382092b..c3526d1 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -3,38 +3,18 @@
  */
 #include "cache.h"
 #include "diff.h"
+#include "tree.h"
 
 // What paths are we interested in?
 static int nr_paths = 0;
 static const char **paths = NULL;
 static int *pathlens = NULL;
 
-static void update_tree_entry(struct tree_desc *desc)
+static const unsigned char *extract(struct tree_entry_list *desc, const char **pathp, unsigned int *modep)
 {
-	void *buf = desc->buf;
-	unsigned long size = desc->size;
-	int len = strlen(buf) + 1 + 20;
-
-	if (size < len)
-		die("corrupt tree file");
-	desc->buf = buf + len;
-	desc->size = size - len;
-}
-
-static const unsigned char *extract(struct tree_desc *desc, const char **pathp, unsigned int *modep)
-{
-	void *tree = desc->buf;
-	unsigned long size = desc->size;
-	int len = strlen(tree)+1;
-	const unsigned char *sha1 = tree + len;
-	const char *path = strchr(tree, ' ');
-	unsigned int mode;
-
-	if (!path || size < len + 20 || sscanf(tree, "%o", &mode) != 1)
-		die("corrupt tree file");
-	*pathp = path+1;
-	*modep = DIFF_FILE_CANON_MODE(mode);
-	return sha1;
+	*pathp = desc->name;
+	*modep = DIFF_FILE_CANON_MODE(desc->mode);
+	return desc->item.any->sha1;
 }
 
 static char *malloc_base(const char *base, const char *path, int pathlen)
@@ -47,9 +27,11 @@ static char *malloc_base(const char *bas
 	return newbase;
 }
 
-static int show_entry(struct diff_options *opt, const char *prefix, struct tree_desc *desc, const char *base);
+static int show_entry(struct diff_options *opt, const char *prefix, struct tree_entry_list *desc, const char *base);
 
-static int compare_tree_entry(struct tree_desc *t1, struct tree_desc *t2, const char *base, struct diff_options *opt)
+static int compare_tree_entry(struct tree_entry_list *t1, 
+			      struct tree_entry_list *t2, 
+			      const char *base, struct diff_options *opt)
 {
 	unsigned mode1, mode2;
 	const char *path1, *path2;
@@ -99,7 +81,7 @@ static int compare_tree_entry(struct tre
 	return 0;
 }
 
-static int interesting(struct tree_desc *desc, const char *base)
+static int interesting(struct tree_entry_list *desc, const char *base)
 {
 	const char *path;
 	unsigned mode;
@@ -153,36 +135,36 @@ static int interesting(struct tree_desc 
 }
 
 /* A whole sub-tree went away or appeared */
-static void show_tree(struct diff_options *opt, const char *prefix, struct tree_desc *desc, const char *base)
+static void show_tree(struct diff_options *opt, const char *prefix, 
+		      struct tree *desc, const char *base)
 {
-	while (desc->size) {
-		if (interesting(desc, base))
-			show_entry(opt, prefix, desc, base);
-		update_tree_entry(desc);
+	struct tree_entry_list *list;
+	parse_tree(desc);
+	list = desc->entries;
+	while (list) {
+		if (interesting(list, base))
+			show_entry(opt, prefix, list, base);
+		list = list->next;
 	}
 }
 
 /* A file entry went away or appeared */
-static int show_entry(struct diff_options *opt, const char *prefix, struct tree_desc *desc, const char *base)
+static int show_entry(struct diff_options *opt, const char *prefix,
+		      struct tree_entry_list *desc, const char *base)
 {
 	unsigned mode;
 	const char *path;
 	const unsigned char *sha1 = extract(desc, &path, &mode);
 
-	if (opt->recursive && S_ISDIR(mode)) {
-		char type[20];
+	if (opt->recursive && desc->directory) {
 		char *newbase = malloc_base(base, path, strlen(path));
-		struct tree_desc inner;
-		void *tree;
+		struct tree *tree;
 
-		tree = read_sha1_file(sha1, type, &inner.size);
-		if (!tree || strcmp(type, "tree"))
-			die("corrupt tree sha %s", sha1_to_hex(sha1));
+		tree = desc->item.tree;
 
-		inner.buf = tree;
-		show_tree(opt, prefix, &inner, newbase);
+		parse_tree(tree);
+		show_tree(opt, prefix, tree, newbase);
 
-		free(tree);
 		free(newbase);
 		return 0;
 	}
@@ -191,36 +173,46 @@ static int show_entry(struct diff_option
 	return 0;
 }
 
-int diff_tree(struct tree_desc *t1, struct tree_desc *t2, const char *base, struct diff_options *opt)
+int diff_tree(struct tree *tree1, struct tree *tree2, const char *base,
+	      struct diff_options *opt)
 {
-	while (t1->size | t2->size) {
-		if (nr_paths && t1->size && !interesting(t1, base)) {
-			update_tree_entry(t1);
+	struct tree_entry_list *t1 = NULL, *t2 = NULL;
+	if (tree1) {
+		parse_tree(tree1);
+		t1 = tree1->entries;
+	}
+	if (tree2) {
+		parse_tree(tree2);
+		t2 = tree2->entries;
+	}
+	while (t1 || t2) {
+		if (nr_paths && t1 && !interesting(t1, base)) {
+			t1 = t1->next;
 			continue;
 		}
-		if (nr_paths && t2->size && !interesting(t2, base)) {
-			update_tree_entry(t2);
+		if (nr_paths && t2 && !interesting(t2, base)) {
+			t2 = t2->next;
 			continue;
 		}
-		if (!t1->size) {
+		if (!t1) {
 			show_entry(opt, "+", t2, base);
-			update_tree_entry(t2);
+			t2 = t2->next;
 			continue;
 		}
-		if (!t2->size) {
+		if (!t2) {
 			show_entry(opt, "-", t1, base);
-			update_tree_entry(t1);
+			t1 = t1->next;
 			continue;
 		}
 		switch (compare_tree_entry(t1, t2, base, opt)) {
 		case -1:
-			update_tree_entry(t1);
+			t1 = t1->next;
 			continue;
 		case 0:
-			update_tree_entry(t1);
+			t1 = t1->next;
 			/* Fallthrough */
 		case 1:
-			update_tree_entry(t2);
+			t2 = t2->next;
 			continue;
 		}
 		die("git-diff-tree: internal error");
@@ -230,21 +222,16 @@ int diff_tree(struct tree_desc *t1, stru
 
 int diff_tree_sha1(const unsigned char *old, const unsigned char *new, const char *base, struct diff_options *opt)
 {
-	void *tree1, *tree2;
-	struct tree_desc t1, t2;
+	struct tree *t1, *t2;
 	int retval;
 
-	tree1 = read_object_with_reference(old, "tree", &t1.size, NULL);
-	if (!tree1)
+	t1 = parse_tree_indirect(old);
+	if (!t1)
 		die("unable to read source tree (%s)", sha1_to_hex(old));
-	tree2 = read_object_with_reference(new, "tree", &t2.size, NULL);
-	if (!tree2)
+	t2 = parse_tree_indirect(new);
+	if (!t2)
 		die("unable to read destination tree (%s)", sha1_to_hex(new));
-	t1.buf = tree1;
-	t2.buf = tree2;
-	retval = diff_tree(&t1, &t2, base, opt);
-	free(tree1);
-	free(tree2);
+	retval = diff_tree(t1, t2, base, opt);
 	return retval;
 }
 
-- 
1.0.GIT

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/3] Remove more parsers
  2006-01-29 19:04 [PATCH 0/3] Remove more parsers Daniel Barkalow
                   ` (2 preceding siblings ...)
  2006-01-29 19:05 ` [PATCH 3/3] Use struct tree in diff-tree Daniel Barkalow
@ 2006-01-29 20:26 ` Junio C Hamano
  2006-01-29 22:05   ` Daniel Barkalow
  3 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2006-01-29 20:26 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git

I do not have objections to git-tar-tree changes, but I am
hesitant to use the tree parser in diff-tree due to its memory
retention behaviour.  We already use the commit parser in
diff-tree but I think we currently do not ask for the tree part
of the object to be parsed.  I suspect this patch would badly
interact with long-running "diff-tree --stdin", which is the
workhorse of whatchanged.  I haven't benched it though, and I'd
be happy to see the impact is proven to be negligible.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/3] Remove more parsers
  2006-01-29 20:26 ` [PATCH 0/3] Remove more parsers Junio C Hamano
@ 2006-01-29 22:05   ` Daniel Barkalow
  2006-01-31 17:16     ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: Daniel Barkalow @ 2006-01-29 22:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Sun, 29 Jan 2006, Junio C Hamano wrote:

> I do not have objections to git-tar-tree changes, but I am
> hesitant to use the tree parser in diff-tree due to its memory
> retention behaviour.  We already use the commit parser in
> diff-tree but I think we currently do not ask for the tree part
> of the object to be parsed.  I suspect this patch would badly
> interact with long-running "diff-tree --stdin", which is the
> workhorse of whatchanged.  I haven't benched it though, and I'd
> be happy to see the impact is proven to be negligible.

I'll look into discarding the struct trees after use (since we're not 
keeping flags on them, or storing references to them long-term), so we can 
use the same parser without worse memory behavior. It does seem to take a 
bunch more memory (and, oddly, be very slow) as I currently have it.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] Use struct tree in diff-tree
  2006-01-29 19:05 ` [PATCH 3/3] Use struct tree in diff-tree Daniel Barkalow
@ 2006-01-31 16:53   ` Linus Torvalds
  2006-01-31 21:20     ` Junio C Hamano
  0 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2006-01-31 16:53 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git, Junio C Hamano



On Sun, 29 Jan 2006, Daniel Barkalow wrote:
>
> It had been open-coding a tree parser. This updates the programs that
> call diff_tree() to send it the struct tree instead of a buffer and
> size.

Please don't.

parse_tree() is extremely broken, and expensive. 

The "struct tree_desc" is a much better abstraction, and avoids all 
overhead. Yes, it's slightly more opaque, and the interfaces could be 
improved: for example, instead of having a

	desc.buf = read_object_with_reference(new, "tree", &desc.size, NULL);
	if (!desc.buf)
		die("unable to read tree");

it might make make sense to introduce a function that does this for you, 
ie just a

	if (populate_tree_descriptor(new, &desc) < 0)
		die("unable to read tree");
	...
	free_tree_descriptor(&desc);

which is perhaps more readable and maintainable.

The "diff_tree()" functions are _extremely_ performance-critical, arguably 
more so than _any_ other part of git. Diffing two trees is one of _the_ 
most common operations, especially so when you want to follow just a 
subset of files with "git-rev-list -- <filename>*", and it's extremely 
important that you don't do malloc()/free() all the time.

So using "struct tree" and the general tree-parsing functions is _wrong_. 
Really REALLY wrong.

>From what I can tell, your version doesn't even do the "free()". Which 
probably means that not only is it slower, but I bet that if you have a 
big repository like the kernel, and you do a slightly more complex 
git-rev-list with multiple files, you'll use up tons and tons and tons of 
memory.

Junio, please don't apply this.

		Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/3] Remove more parsers
  2006-01-29 22:05   ` Daniel Barkalow
@ 2006-01-31 17:16     ` Linus Torvalds
  0 siblings, 0 replies; 11+ messages in thread
From: Linus Torvalds @ 2006-01-31 17:16 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Junio C Hamano, git



On Sun, 29 Jan 2006, Daniel Barkalow wrote:
> 
> I'll look into discarding the struct trees after use (since we're not 
> keeping flags on them, or storing references to them long-term), so we can 
> use the same parser without worse memory behavior. It does seem to take a 
> bunch more memory (and, oddly, be very slow) as I currently have it.

Really, trying to use "struct tree" just is _broken_.

Even if you start freeing the memory, performance will absolutely _suck_. 
You'll be using a "malloc()" (and eventually, a "free()") and copying the 
tree entries around for absolutely zero gain. You'll make things follow 
pointers and have indirection, instead of just walking the data structure 
directly.

IOW, it will be about ten times more expensive in CPU time, in the most 
performance-critical part of git.

I would actually argue that if you want to use a common structure, try to 
convert existing users of "struct tree" to the "struct tree_desc" 
iterators. Yes, their use is a little awkward, but there's really no way 
you can use anything but the fast ones for tree diffing.

The "struct tree_desc" can only be used for traversing a tree linearly in 
one order, but I bet that nobody really ever does anything else.

To make "struct tree_desc" more generic, you'd obviously have to export 
the "update_tree_entry()" and "extract()" functions (and the latter would 
probably need to be re-named to "extract_tree_entry()").

Oh, and to make freeign the tree entry a bit simpler, you'd probably want 
to replace

	void *buf
	unsigned int size;

with a

	const void *const tree_base;
	const unsigned int tree_size;
	unsigned int offset;

so that you'd just update "offset" and leave tree_base/size untouched (to 
make freeing easier - right now the person who populates the "struct 
tree_desc" has to free the thing by hand).

		Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] Use struct tree in diff-tree
  2006-01-31 16:53   ` Linus Torvalds
@ 2006-01-31 21:20     ` Junio C Hamano
  2006-01-31 21:49       ` Daniel Barkalow
  2006-01-31 22:07       ` Linus Torvalds
  0 siblings, 2 replies; 11+ messages in thread
From: Junio C Hamano @ 2006-01-31 21:20 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Daniel Barkalow, git

Linus Torvalds <torvalds@osdl.org> writes:

> On Sun, 29 Jan 2006, Daniel Barkalow wrote:
>>
>> It had been open-coding a tree parser. This updates the programs that
>> call diff_tree() to send it the struct tree instead of a buffer and
>> size.
>
> Please don't.
> ...
> Junio, please don't apply this.
>
> 		Linus

I haven't, and I won't.  From my gut feeling I did not even
place it in "pu".  After timing it myself and then looking at
the code I agree with your analysis.

The one to git-tar-tree I've already applied, mostly because I
was not careful enough and especially I did not care enough
about performance of that program.  On my slow machine the tip
of kernel before you came back takes 9.2 seconds wallclock as
opposed to 8.7 seconds to tar up, so the patch degrades the
performance by about 5%.  Maybe we would want to revert that one
as well.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] Use struct tree in diff-tree
  2006-01-31 21:20     ` Junio C Hamano
@ 2006-01-31 21:49       ` Daniel Barkalow
  2006-01-31 22:07       ` Linus Torvalds
  1 sibling, 0 replies; 11+ messages in thread
From: Daniel Barkalow @ 2006-01-31 21:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git

On Tue, 31 Jan 2006, Junio C Hamano wrote:

> Linus Torvalds <torvalds@osdl.org> writes:
> 
> > On Sun, 29 Jan 2006, Daniel Barkalow wrote:
> >>
> >> It had been open-coding a tree parser. This updates the programs that
> >> call diff_tree() to send it the struct tree instead of a buffer and
> >> size.
> >
> > Please don't.
> > ...
> > Junio, please don't apply this.
> >
> > 		Linus
> 
> I haven't, and I won't.  From my gut feeling I did not even
> place it in "pu".  After timing it myself and then looking at
> the code I agree with your analysis.
> 
> The one to git-tar-tree I've already applied, mostly because I
> was not careful enough and especially I did not care enough
> about performance of that program.  On my slow machine the tip
> of kernel before you came back takes 9.2 seconds wallclock as
> opposed to 8.7 seconds to tar up, so the patch degrades the
> performance by about 5%.  Maybe we would want to revert that one
> as well.

Probably a better solution is to move the tree parser from tree-diff.c 
into tree.c, provide a clear API in tree.h, and use that in tar-tree. 
There's definitely no need to have another parser beyond the one 
currently in tree-diff and the struct tree one.

I think I should be able to change the struct tree API slightly to make it 
an iterator instead of a linked list, and get the efficiency benefits of 
the tree-diff parser while still having a nice API.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] Use struct tree in diff-tree
  2006-01-31 21:20     ` Junio C Hamano
  2006-01-31 21:49       ` Daniel Barkalow
@ 2006-01-31 22:07       ` Linus Torvalds
  1 sibling, 0 replies; 11+ messages in thread
From: Linus Torvalds @ 2006-01-31 22:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Daniel Barkalow, git



On Tue, 31 Jan 2006, Junio C Hamano wrote:
> 
> The one to git-tar-tree I've already applied, mostly because I
> was not careful enough and especially I did not care enough
> about performance of that program.  On my slow machine the tip
> of kernel before you came back takes 9.2 seconds wallclock as
> opposed to 8.7 seconds to tar up, so the patch degrades the
> performance by about 5%.  Maybe we would want to revert that one
> as well.

Hmm. Rather than revert it outright, it might be better to make it use the 
nicer parsing functions and "struct tree_desc".

It shouldn't look _that_ different from the "struct tree" version: instead 
of doing

	item = item->next;

it would do

	update_tree_entry(tree);

instead.

Give me a minute, I'll send you patches.

		Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2006-01-31 22:08 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-01-29 19:04 [PATCH 0/3] Remove more parsers Daniel Barkalow
2006-01-29 19:04 ` [PATCH 1/3] Use struct tree in tar-tree Daniel Barkalow
2006-01-29 19:05 ` [PATCH 2/3] Use struct commit " Daniel Barkalow
2006-01-29 19:05 ` [PATCH 3/3] Use struct tree in diff-tree Daniel Barkalow
2006-01-31 16:53   ` Linus Torvalds
2006-01-31 21:20     ` Junio C Hamano
2006-01-31 21:49       ` Daniel Barkalow
2006-01-31 22:07       ` Linus Torvalds
2006-01-29 20:26 ` [PATCH 0/3] Remove more parsers Junio C Hamano
2006-01-29 22:05   ` Daniel Barkalow
2006-01-31 17:16     ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).