git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
To: Junio C Hamano <junkio@cox.net>
Cc: git@vger.kernel.org, Franck Bui-Huu <vagabon.xyz@gmail.com>
Subject: Re: [PATCH][RFC] Add git-archive-tree
Date: Wed, 06 Sep 2006 20:05:18 +0200	[thread overview]
Message-ID: <44FF0DDE.7030700@lsrfire.ath.cx> (raw)
In-Reply-To: <7vwt8mx8lb.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano schrieb:
> Rene Scharfe <rene.scharfe@lsrfire.ath.cx> writes:
> 
>> Currently git-archive-tree -f tar is slower than git-tar-tree.  This is
>> because it is welded to the side of the existing code to minimize patch
>> size, and I also suspect read_tree_recursive() to be quite a bit slower
>> than builtin-tar-tree.c::traverse_tree().
> 
> Yes, I suspect "struct object" and friends are very inefficient
> to use for things like this.  "struct tree_desc" based traverser
> is much preferred.

Turns out the reason for git-archive-tree -f tar being 10% slower than
git-tar-tree was a stupid memory leak in write_tar_entry(). *blush*
It caused lots of brk() calls (i.e. system calls).

In order to simplify measurement, I commented out the body of
write_entry(), which both git-archive-tree and git-tar-tree are calling
to write their output.  The rest left is basically two pure tree
traversers, git-archive-tree using read_tree_recursive() and git-tar-tree
using its struct tree_desc based traverse_tree().

I then let the two chew away on the kernel repository.  And as
kcachegrind impressively shows, all we do with our trees and objects is
dwarfed by inflate().  In both cases more than 96.6% of the cost lies
within libz.  That's not too surprising, because archivers need to
decompress _all_ objects, not only trees.

So for git-archive we can pretty much chose which traverser to use based
on convenience.

As a second experiment I wrote a struct tree_desc based traverser plus a
matching read_tree_recursive() compatibility wrapper (included below for
reference, not for inclusion) and compared the performance of
'git-ls-tree -r -t' on the kernel repository with and without it.  The
result is that the relative cost of all functions from tree.c combined
decreased from 0.93% to 0.66%.  Ugh.

So while a struct tree_desc based traverser can be significantly faster
than read_tree_recursive(), as soon as you actually start to do something
to the trees that difference pales to insignificance.

René



diff --git a/tree.c b/tree.c
index ea386e5..977a4aa 100644
--- a/tree.c
+++ b/tree.c
@@ -4,6 +4,7 @@ #include "blob.h"
 #include "commit.h"
 #include "tag.h"
 #include "tree-walk.h"
+#include "strbuf.h"
 #include <stdlib.h>
 
 const char *tree_type = "tree";
@@ -227,3 +228,99 @@ struct tree *parse_tree_indirect(const u
 			parse_object(obj->sha1);
 	} while (1);
 }
+
+static int do_read_tree_recursive_light(struct tree_desc *desc,
+                                        struct strbuf *base,
+                                        const char **match, read_tree_fn_t fn)
+{
+	struct name_entry entry;
+	int err = 0;
+	int baselen = base->len;
+
+	while (tree_entry(desc, &entry)) {
+		if (!match_tree_entry(base->buf, base->len, entry.path, entry.mode, match))
+			continue;
+
+		err = fn(entry.sha1, base->buf, base->len, entry.path, entry.mode, 0);
+		switch (err) {
+		case 0:
+			continue;
+		case READ_TREE_RECURSIVE:
+			break;
+		default:
+			return -1;
+		}
+
+		if (S_ISDIR(entry.mode)) {
+			struct tree_desc subtree;
+			char type[20];
+			void *buf;
+			int newbaselen;
+
+			buf = read_sha1_file(entry.sha1, type, &subtree.size);
+			if (!buf)
+				return error("cannot read %s",
+				             sha1_to_hex(entry.sha1));
+			if (strcmp(type, tree_type)) {
+				free(buf);
+				return error("Object %s not a tree",
+				             sha1_to_hex(entry.sha1));
+			}
+			subtree.buf = buf;
+
+			newbaselen = baselen + entry.pathlen + 1;
+			if (newbaselen > base->alloc) {
+				base->buf = xrealloc(base->buf, newbaselen);
+				base->alloc = newbaselen;
+			}
+			memcpy(base->buf + baselen, entry.path, entry.pathlen);
+			base->buf[baselen + entry.pathlen] = '/';
+			base->len = newbaselen;
+
+			err = do_read_tree_recursive_light(&subtree,
+			                                   base,
+			                                   match, fn);
+			base->len = baselen;
+			free(buf);
+			if (err)
+				break;
+		}
+	}
+
+	return err;
+}
+
+int read_tree_recursive_light(struct tree *tree,
+                              const char *base, int baselen, int stage,
+                              const char **match, read_tree_fn_t fn)
+{
+	unsigned char *sha1 = tree->object.sha1;
+	struct tree_desc desc;
+	char type[20];
+	void *buf;
+	int err;
+	struct strbuf sb;
+
+	sb.buf = xmalloc(PATH_MAX);
+	sb.alloc = PATH_MAX;
+	sb.len = 0;
+	if (baselen > sb.alloc) {
+		sb.buf = xrealloc(sb.buf, baselen);
+		sb.alloc = baselen;
+	}
+	memcpy(sb.buf, base, baselen);
+	sb.len = baselen;
+
+	desc.buf = buf = read_sha1_file(sha1, type, &desc.size);
+	if (!buf)
+		return error("Could not read %s", sha1_to_hex(sha1));
+	if (strcmp(type, tree_type)) {
+		free(buf);
+		return error("Object %s not a tree", sha1_to_hex(sha1));
+	}
+
+	err = do_read_tree_recursive_light(&desc, &sb, match, fn);
+	free(buf);
+
+	return err;
+}
diff --git a/tree.h b/tree.h
index dd25c53..2294bc2 100644
--- a/tree.h
+++ b/tree.h
@@ -30,4 +30,8 @@ extern int read_tree_recursive(struct tr
 
 extern int read_tree(struct tree *tree, int stage, const char **paths);
 
+extern int read_tree_recursive_light(struct tree *tree,
+                                     const char *base, int baselen, int stage,
+                                     const char **match, read_tree_fn_t fn);
+
 #endif /* TREE_H */

  reply	other threads:[~2006-09-06 18:05 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-09-02 12:23 [PATCH][RFC] Add git-archive-tree Rene Scharfe
2006-09-02 12:37 ` [PATCH] Add support for tgz archive format Rene Scharfe
2006-09-02 13:10 ` [PATCH][RFC] Add git-archive-tree Rene Scharfe
2006-09-02 20:13   ` Franck Bui-Huu
2006-09-04 18:22     ` Rene Scharfe
2006-09-04 20:09       ` Junio C Hamano
2006-09-04 22:02         ` Rene Scharfe
2006-09-04 22:20           ` Junio C Hamano
2006-09-05 11:43             ` Franck Bui-Huu
2006-09-02 21:19   ` Junio C Hamano
2006-09-02 14:17 ` Rene Scharfe
2006-09-02 15:24 ` Franck Bui-Huu
2006-09-02 16:08   ` Rene Scharfe
2006-09-02 21:27 ` Junio C Hamano
2006-09-06 18:05   ` Rene Scharfe [this message]
2006-09-06 21:47     ` Junio C Hamano
2006-09-17 11:54       ` Rene Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44FF0DDE.7030700@lsrfire.ath.cx \
    --to=rene.scharfe@lsrfire.ath.cx \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    --cc=vagabon.xyz@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).